haproxy

mirror of http://git.haproxy.org/git/haproxy.git synced 2026-02-22 10:03:28 +02:00

Author	SHA1	Message	Date
Willy Tarreau	f263a45ddf	BUG/MINOR: pools: don't report "limited to the first X entries" by default With the fix in commit `982805e6a3` ("BUG/MINOR: pools: Fix the dump of pools info to deal with buffers limitations"), the max count is now compared to the number of dumped pools instead of the configured numbered, and keeping >= is no longer valid because maxcnt is set by default to the same value when not set, so this means that since this patch we're always displaying "limited to the first X entries" where X is the number of dumped entries even in the absence of any limitation. Let's just fix the comparison to only show this when the limit is lower. This must be backported to 3.2 where the patch above already is.	2025-10-16 08:41:32 +02:00
Willy Tarreau	ab0c97139f	BUG/MEDIUM: pools: fix crash on filtered "show pools" output The truncation of pools output that was adressed in commit `982805e6a3` ("BUG/MINOR: pools: Fix the dump of pools info to deal with buffers limitations") required to split the pools filling from dumping. However there is a problem when a limit is passed that is lower than the number of pools or if a pool name is specified or if pool caches are disabled, because in this case the number of filled slots will be lower than the initially allocated one, and empty entries will be visited either by the sort functions when filling the entries if "byxxx" is specified, or by the dump function after the last entry, but none of these functions was expecting to be passed a NULL entry. Let's just re-adjust nbpools to match the number of filled entries at the end. Anyway the totals are calculated on the number of dumped entries. This must be backported to 3.2 since the fix above was backported there as well.	2025-10-16 08:41:32 +02:00
Willy Tarreau	8fb5ae5cc6	MINOR: activity/memory: count allocations performed under a lock By checking the current thread's locking status, it becomes possible to know during a memory allocation whether it's performed under a lock or not. Both pools and memprofile functions were instrumented to check for this and to increment the memprofile bin's locked_calls counter. This one, when not zero, is reported on "show profiling memory" with a percentage of all allocations that such locked allocations represent. This way it becomes possible to try to target certain code paths that are particularly expensive. Example: $ socat - /tmp/sock1 <<< "show profiling memory"\|grep lock 20297301 0 2598054528 0\| 0x62a820fa3991 sockaddr_alloc+0x61/0xa3 p_alloc(128) [pool=sockaddr] [locked=54962 (0.2 %)] 0 20297301 0 2598054528\| 0x62a820fa3a24 sockaddr_free+0x44/0x59 p_free(-128) [pool=sockaddr] [locked=34300 (0.1 %)] 9908432 0 1268279296 0\| 0x62a820eb8524 main+0x81974 p_alloc(128) [pool=task] [locked=9908432 (100.0 %)] 9908432 0 554872192 0\| 0x62a820eb85a6 main+0x819f6 p_alloc(56) [pool=tasklet] [locked=9908432 (100.0 %)] 263001 0 63120240 0\| 0x62a820fa3c97 conn_new+0x37/0x1b2 p_alloc(240) [pool=connection] [locked=20662 (7.8 %)] 71643 0 47307584 0\| 0x62a82105204d pool_get_from_os_noinc+0x12d/0x161 posix_memalign(660) [locked=5393 (7.5 %)]	2025-09-11 16:32:34 +02:00
Willy Tarreau	9d8c2a888b	MINOR: activity: collect CPU time spent on memory allocations for each task When task profiling is enabled, the pool alloc/free code will measure the time it takes to perform memory allocation after a cache miss or memory freeing to the shared cache or OS. The time taken with the thread-local cache is never measured as measuring that time is very expensive compared to the pool access time. Here doing so costs around 2% performance at 2M req/s, only when task profiling is enabled, so this remains reasonable. The scheduler takes care of collecting that time and updating the sched_activity entry corresponding to the current task when task profiling is enabled. The goal clearly is to track places that are wasting CPU time allocating and releasing too often, or causing large evictions. This appears like this in "show profiling tasks aggr": Tasks activity over 11.428 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lkd_avg mem_avg lat_avg process_stream 44183891 16.47m 22.36us 491.0ns 1.154us 1.000ns 101.1us h1_io_cb 57386064 4.011m 4.193us 20.00ns 16.00ns - 29.47us sc_conn_io_cb 42088024 49.04s 1.165us - - - 54.67us h1_timeout_task 438171 196.5ms 448.0ns - - - 100.1us srv_cleanup_toremove_conns 65 1.468ms 22.58us 184.0ns 87.00ns - 101.3us task_process_applet 3 508.0us 169.3us - 107.0us 1.847us 29.67us srv_cleanup_idle_conns 6 225.3us 37.55us 15.74us 36.84us - 49.47us accept_queue_process 2 45.62us 22.81us - - 4.949us 54.33us	2025-09-11 16:32:34 +02:00
Christopher Faulet	e653dc304e	MINOR: pools: Don't dump anymore info about pools when purge is forced Historically, when the purge of pools was forced by sending a SIGQUIT to haproxy, information about the pools were first dumped. It is now totally pointless because these info can be retrieved via the CLI. It is even less relevant now because the purge is forced typically when there are memroy issues and to dump pools information, data must be allocated. dump_pools_info() function was simplified because it is now called only from an applet. No reason to still try to dump info on stderr.	2025-09-08 16:04:40 +02:00
Christopher Faulet	982805e6a3	BUG/MINOR: pools: Fix the dump of pools info to deal with buffers limitations The "show pools" CLI command was not designed to dump information exceeding the size of a buffer. But there is now much more pools than few years ago and when detailed information are dumped, we exceeds the buffer limit and the output is truncated. To fix the issue, the command must be refactored to be able to stream the result. To do so, the array containing pools info is now part of the command context and it is dynamically allocated. A dedicated function was created to fill all info. In addition, the index of the next pool to dump is saved in the command context too to properly handle resumption cases. Finally global information about pools are also stored in the command context for convenience. This patch should fix the issue #3067. It must be backported to 3.2. On older release, the buffer limit is never reached.	2025-09-08 16:01:51 +02:00
Willy Tarreau	6be7b64bb4	MINOR: pools: always check that requested alignment matches the type's For pool registrations that are created from the type declaration, we now have the ability to verify that the requested alignment matches the type's one. Let's not miss this opportunity, as we've met bugs in the past that were caused by such mismatches. The principle is simple: if the type alignment is known, we check that the configured alignment is at least as large as that one otherwise we refuse to start (since the code may crash at any moment). Obviously it doesn't crash for now!	2025-08-11 19:55:30 +02:00
Willy Tarreau	ef915e672a	MEDIUM: pools: respect pool alignment in allocations Now pool_alloc_area() takes the alignment in argument and makes use of ha_aligned_malloc() instead of malloc(). pool_alloc_area_uaf() simply applies the alignment before returning the mapped area. The pool_free() functionn calls ha_aligned_free() so as to permit to use a specific API for aligned alloc/free like mingw requires. Note that it's possible to see warnings about mismatching sized during pool_free() since we know both the pool and the type. In pool_free, adding just this is sufficient to detect potential offenders: WARN_ON(__alignof__(*__ptr) > pool->align);	2025-08-06 19:20:36 +02:00
Willy Tarreau	6ea0e3e2f8	MINOR: pools: add macros to register aligned pools This adds an alignment argument to create_pool_from_loc() and completes the existing low-level macros with new ones that expose the alignment and the new macros permit to specify it. For now they're not used.	2025-08-06 19:20:36 +02:00
Willy Tarreau	eb075d15f6	MEDIUM: pools: add an alignment property This will be used to declare aligned pools. For now it's not used, but it's properly set from the various registrations that compose a pool, and rounded up to the next power of 2, with a minimum of sizeof(void*). The alignment is returned in the "show pools" part that indicates the entry size. E.g. "(56 bytes/8)" means 56 bytes, aligned by 8.	2025-08-06 19:20:36 +02:00
Willy Tarreau	ac23b873f5	DEBUG: pools: also retrieve file and line for direct callers of create_pool() Just like previous patch, we want to retrieve the location of the caller. For this we turn create_pool() into a macro that collects __FILE__ and __LINE__ and passes them to the now renamed function create_pool_with_loc(). Now the remaining ~30 pools also have their location stored.	2025-08-06 19:20:34 +02:00
Willy Tarreau	efa856a8b0	DEBUG: pools: store the pool registration file name and line number When pools are declared using DECLARE_POOL(), REGISTER_POOL etc, we know where they are and it's trivial to retrieve the file name and line number, so let's store them in the pool_registration, and display them when known in "show pools detailed".	2025-08-06 19:20:32 +02:00
Willy Tarreau	ff62aacb20	MEDIUM: pools: change the static pool creation to pass a registration Now we're creating statically allocated registrations instead of passing all the parameters and allocating them on the fly. Not only this is simpler to extend (we're limited in number of INITCALL args), but it also leaves all of these in the data segment where they are easier to find when debugging.	2025-08-06 19:20:30 +02:00
Willy Tarreau	f51d58bd2e	MINOR: pools: force the name at creation time to be a const. This is already the case as all names are constant so that's fine. If it would ever change, it's not very hard to just replace it in-situ via an strdup() and set a flag to mention that it's dynamically allocated. We just don't need this right now. One immediately visible effect is in "show pools detailed" where the names are no longer truncated.	2025-08-06 19:20:28 +02:00
Willy Tarreau	ee5bc28865	MINOR: pools: add a new flag to declare static registrations We must not free these ones when destroying a pool, so let's dedicate them a flag to mention that they are static. For now we don't have any such.	2025-08-06 19:20:26 +02:00
Willy Tarreau	18505f9718	MINOR: pools: support creating a pool from a pool registration We've recently introduced pool registrations to be able to enumerate all pool creation requests with their respective parameters, but till now they were only used for debugging ("show pools detailed"). Let's go a step further and split create_pool() in two: - the first half only allocates and sets the pool registration - the second half creates the pool from the registration This is what this patch does. This now opens the ability to pre-create registrations and create pools directly from there.	2025-08-06 19:20:22 +02:00
Willy Tarreau	1404f6fb7b	DEBUG: pools: add a new integrity mode "backup" to copy the released area This way we can preserve the entire contents of the released area for later inspection. This automatically enables comparison at reallocation time as well (like "integrity" does). If used in combination with integrity, the comparison is disabled but the check of non-corruption of the area mangled by integrity is still operated.	2025-05-09 14:57:00 +02:00
Willy Tarreau	0ae14beb2a	DEBUG: pool: permit per-pool UAF configuration The new MEM_F_UAF flag can be set just after a pool's creation to make this pool UAF for debugging purposes. This allows to maintain a better overall performance required to reproduce issues while still having a chance to catch UAF. It will only be used by developers who will manually add it to areas worth being inspected, though.	2025-05-09 13:59:02 +02:00
Willy Tarreau	6b17310757	MEDIUM: pools: be a bit smarter when merging comparable size pools By default, pools of comparable sizes are merged together. However, the current algorithm is dumb: it rounds the requested size to the next multiple of 16 and compares the sizes like this. This results in many entries which are already multiples of 16 not being merged, for example 1024 and 1032 are separate, 65536 and 65540 are separate, 48 and 56 are separate (though 56 merges with 64). This commit changes this to consider not just the entry size but also the average entry size, that is, it compares the average size of all objects sharing the pool with the size of the object looking for a pool. If the object is not more than 1% bigger nor smaller than the current average size or if it neither 16 bytes smaller nor larger, then it can be merged. Also, it always respects exact matches in order to avoid merging objects into larger pools or worse, extending existing ones for no reason, and when there's a tie, it always avoids extending an existing pool. Also, we now visit all existing pools in order to spot the best one, we do not stop anymore at the smallest one large enough. Theoretically this could cost a bit of CPU but in practice it's O(N^2) with N quite small (typically in the order of 100) and the cost at each step is very low (compare a few integer values). But as a side effect, pools are no longer sorted by size, "show pools bysize" is needed for this. This causes the objects to be much better grouped together, accepting to use a little bit more sometimes to avoid fragmentation, without causing everyone to be merged into the same pool. Thanks to this we're now seeing 36 pools instead of 48 by default, with some very nice examples of compact grouping: - Pool qc_stream_r (80 bytes) : 13 users > qc_stream_r : size=72 flags=0x1 align=0 > quic_cstrea : size=80 flags=0x1 align=0 > qc_stream_a : size=64 flags=0x1 align=0 > hlua_esub : size=64 flags=0x1 align=0 > stconn : size=80 flags=0x1 align=0 > dns_query : size=64 flags=0x1 align=0 > vars : size=80 flags=0x1 align=0 > filter : size=64 flags=0x1 align=0 > session pri : size=64 flags=0x1 align=0 > fcgi_hdr_ru : size=72 flags=0x1 align=0 > fcgi_param_ : size=72 flags=0x1 align=0 > pendconn : size=80 flags=0x1 align=0 > capture : size=64 flags=0x1 align=0 - Pool h3s (56 bytes) : 17 users > h3s : size=56 flags=0x1 align=0 > qf_crypto : size=48 flags=0x1 align=0 > quic_tls_se : size=48 flags=0x1 align=0 > quic_arng : size=56 flags=0x1 align=0 > hlua_flt_ct : size=56 flags=0x1 align=0 > promex_metr : size=48 flags=0x1 align=0 > conn_hash_n : size=56 flags=0x1 align=0 > resolv_requ : size=48 flags=0x1 align=0 > mux_pt : size=40 flags=0x1 align=0 > comp_state : size=40 flags=0x1 align=0 > notificatio : size=48 flags=0x1 align=0 > tasklet : size=56 flags=0x1 align=0 > bwlim_state : size=48 flags=0x1 align=0 > xprt_handsh : size=48 flags=0x1 align=0 > email_alert : size=56 flags=0x1 align=0 > caphdr : size=41 flags=0x1 align=0 > caphdr : size=41 flags=0x1 align=0 - Pool quic_cids (32 bytes) : 13 users > quic_cids : size=16 flags=0x1 align=0 > quic_tls_ke : size=32 flags=0x1 align=0 > quic_tls_iv : size=12 flags=0x1 align=0 > cbuf : size=32 flags=0x1 align=0 > hlua_queuew : size=24 flags=0x1 align=0 > hlua_queue : size=24 flags=0x1 align=0 > promex_modu : size=24 flags=0x1 align=0 > cache_st : size=24 flags=0x1 align=0 > spoe_appctx : size=32 flags=0x1 align=0 > ehdl_sub_tc : size=32 flags=0x1 align=0 > fcgi_flt_ct : size=16 flags=0x1 align=0 > sig_handler : size=32 flags=0x1 align=0 > pipe : size=24 flags=0x1 align=0 - Pool quic_crypto (1032 bytes) : 2 users > quic_crypto : size=1032 flags=0x1 align=0 > requri : size=1024 flags=0x1 align=0 - Pool quic_conn_r (65544 bytes) : 2 users > quic_conn_r : size=65536 flags=0x1 align=0 > dns_msg_buf : size=65540 flags=0x1 align=0 On a very unscientific test consisting in sending 1 million H1 requests and 1 million H2 requests to the stats page, we're seeing an ~6% lower memory usage with the patch: before the patch: Total: 48 pools, 4120832 bytes allocated, 4120832 used (~3555680 by thread caches). after the patch: Total: 36 pools, 3880648 bytes allocated, 3880648 used (~3299064 by thread caches). This should be taken with care however since pools allocate and release in batches.	2025-03-25 18:01:01 +01:00
Willy Tarreau	9091c5317f	MINOR: cli/pools: record the list of pool registrations even when merging them By default, create_pool() tries to merge similar pools into one. But when dealing with certain bugs, it's hard to say which ones were merged together. We do have the information at registration time, so let's just create a list of registrations ("pool_registration") attached to each pool, that will store that information. It can then be consulted on the CLI using "show pools detailed", where the names, sizes, alignment and flags are reported.	2025-03-21 17:09:30 +01:00
Willy Tarreau	baf8b742b4	MINOR: pools: rename the "by_what" field of the show pools context to "how" The goal will be to support other dump options. We don't need 32 bits to express sorting criteria, let's reserve only 4 bits for them and leave the remaining ones unused.	2025-03-21 17:09:30 +01:00
Ilia Shipitsin	beca953c55	BUG/MINOR: pool: handle a possible strdup() failure This defect was found by the coccinelle script "unchecked-strdup.cocci". It can be backported to all supported branches.	2025-01-02 14:31:07 +01:00
Willy Tarreau	ed3ed35867	BUG/MEDIUM: pools/memprofile: always clean stale pool info on pool_destroy() There's actually a problem with memprofiles: the pool pointer is stored in ->info but some pools are replaced during startup, such as the trash pool, leaving a dangling pointer there, that may randomly report crap or even crash during "show profile memory". Let's make pool_destroy() call memprof_remove_stale_info() added by previous patch so that these entries are properly unregistered. This must be backported along with the previous patch (MINOR: activity/memprofile: offer a function to unregister stale info) as far as 2.8.	2024-11-21 19:58:06 +01:00
Willy Tarreau	fba48e1c40	MINOR: pools: export the pools variable We want it to be accessible from debuggers for inspection and it's currently unavailable. Let's start by exporting it as a first step.	2024-10-24 16:12:46 +02:00
Willy Tarreau	87d269707b	OPTIM: pool: improve needed_avg cache line access pattern On an AMD EPYC 3rd gen, 20% of the CPU is spent calculating the amount of pools needed when using QUIC, because pool allocations/releases are quite frequent and the inter-CCX communication is super slow. Still, there's a way to save between 0.5 and 1% CPU by using fetch-add and sub-fetch that are converted to XADD so that the result is directly fed into the swrate_add argument without having to re-read the memory area. That's what this patch does.	2024-07-09 16:46:38 +02:00
Willy Tarreau	c0ee2d78d7	DEBUG: pools: report the data around the offending area in case of mismatch When the integrity check fails, it's useful to get a dump of the area around the first faulty byte. That's what this patch does. For example it now shows this before reporting info about the tag itself: Contents around first corrupted address relative to pool item:. Contents around address 0xe4febc0792c0+40=0xe4febc0792e8: 0xe4febc0792c8 [80 75 56 d8 fe e4 00 00] [.uV.....] 0xe4febc0792d0 [a0 f7 23 a4 fe e4 00 00] [..#.....] 0xe4febc0792d8 [90 75 56 d8 fe e4 00 00] [.uV.....] 0xe4febc0792e0 [d9 93 fb ff fd ff ff ff] [........] 0xe4febc0792e8 [d9 93 fb ff ff ff ff ff] [........] 0xe4febc0792f0 [d9 93 fb ff ff ff ff ff] [........] 0xe4febc0792f8 [d9 93 fb ff ff ff ff ff] [........] 0xe4febc079300 [d9 93 fb ff ff ff ff ff] [........] This may be backported to 2.9 and maybe even 2.8 as it does help spot the cause of the memory corruption.	2024-04-12 18:01:55 +02:00
Willy Tarreau	16e3655fbd	REORG: pool: move the area dump with symbol resolution to tools.c This function is particularly useful to dump unknown areas watching for opportunistic symbols, so let's move it to tools.c so that we can reuse it a little bit more.	2024-04-12 18:01:20 +02:00
Willy Tarreau	b21aaef4e5	DEBUG: pool: improve decoding of corrupted pools When a corruption was detected in an object, it's often said that the tag doesn't match the pool, but it should also check if it matches the location of an earlier pool_free() call, which happens when -dMcaller is used. That's what we're doing now.	2024-04-12 18:01:05 +02:00
Willy Tarreau	772f9a5874	BUILD: pools: make DEBUG_MEMORY_POOLS=1 the default option This option has been set by default for a very long time and also complicates the manipulation of the DEBUG variable. Let's make it the official default and permit to unset it by setting it to zero. The other pool-related DEBUG options were adjusted to also explicitly check for the zero value for consistency.	2024-04-11 17:25:45 +02:00
Willy Tarreau	b746af9990	BUG/MEDIUM: pool: fix rare risk of deadlock in pool_flush() As reported by github user @JB0925 in issue #2427, there is a possible crash in pool_flush(). The problem is that if the free_list is not empty in the first test, and is empty at the moment the xchg() is performed, for example because another thread called it in parallel, we place a POOL_BUSY there that is never removed later, causing the next thread to wait forever. This was introduced in 2.5 with commit `2a4523f6f` ("BUG/MAJOR: pools: fix possible race with free() in the lockless variant"). It has probably very rarely been detected, because: - pool_flush() is only called when stopping is set - the function does nothing if global pools are disabled, which is the case on most modern systems with a fast memory allocator. It's possible to reproduce it by modifying __task_free() to call pool_flush() on 1% of the calls instead of only when stopping. The fix is quite simple, it consists in moving the zeroing of the entry in the break path after verifying that the entry was not already busy. This must be backported wherever commit `2a4523f6f` is.	2024-02-10 12:38:40 +01:00
Ilya Shipitsin	80813cdd2a	CLEANUP: assorted typo fixes in the code and comments This is 37th iteration of typo fixes	2023-11-23 16:23:14 +01:00
Willy Tarreau	a57f2a5cfe	BUG/MEDIUM: pool: try once to allocate from another bucket if empty In order to limit inter-thread contention on the global pool, in 2.9-dev3 with commit `7bf829ace` ("MAJOR: pools: move the shared pool's free_list over multiple buckets"), it was decided that if the selected bucket had an empty free list, we would simply give up and fall back to the OS allocator. But this causes allocations to be made from the OS for certain threads, to be released to overloaded pools that are sent back to the OS. One visible effect is that sending a lot of traffic using h2load with 100 parallel streams over 100 connections causes 5-10k buffers to be allocated, then reducing the load to only 10 connections doesn't make these allocations go down, just because some buckets are no longer visited. Tests show that giving a second chance to pick another bucket in this case is sufficient to visit all other buckets and recycle their pending objects. Now "show pools" that starts at 10k buffers at 100 connections goes down to about 150 with 1 connection and 100 streams in a fraction of a second. No backport is needed, as the issue is only in 2.9.	2023-11-08 17:14:03 +01:00
Willy Tarreau	a9ae094b27	BUG/MINOR: pool: check one other random bucket on alloc conflict Since 2.9-dev3 with commit `7bf829ace` ("MAJOR: pools: move the shared pool's free_list over multiple buckets"), the global pool supports multiple heads to reduce inter-thread contention. However, when grabbing a freelist head fails because another thread is already picking from it, we just skip to the next one and try again. Unfortunately, it still maintains a bit of contention between thread pairs when for some reasons only a few threads are used. This may happen for example when running on a 4- or 8- thread system and the two most active ones end up on adjacent buckets. A better and much simpler solution consists in visiting a random bucket instead of the current one. Tests show that the CPU usage spent in pool_refill_local_from_shared() reduces at low number of connections (hence threads). No backport is needed, as the issue is only in 2.9.	2023-11-08 17:12:49 +01:00
Willy Tarreau	96bb99a87d	DEBUG: pools: detect that malloc_trim() is in progress Now when calling ha_panic() with a thread still under malloc_trim(), we'll set a new tainted flag to easily report it, and the output trace will report that this condition happened and will suggest to use no-memory-trimming to avoid it in the future.	2023-10-25 15:48:02 +02:00
Willy Tarreau	5714aff4a6	DEBUG: pool: store the memprof bin on alloc() and update it on free() When looking at "show pools", it's often difficult to know which alloc() corresponds to which free() since it's not often 1:1. But sometimes we have all elements available to maintain a link between alloc and free. Indeed, when the caller is recorded in the allocated area, we can store the pointer to the just created bin instead of the caller address itself, since the caller address is already in the memprof bin. By doing so, we permit the pool_free() call to locate the allocator bin and update its free count when caller tracing is enabled. This for example allows to produce outputs like this on "show profiling" and a process started with -dMcaller: 1391967 1391968 22805987328 22806003712\| 0x59f72f process_stream+0x19f/0x3a7a p_alloc(0) [delta=-16384] [pool=buffer] 1391936 1391937 22805479424 22805495808\| 0x6e1476 task_run_applet+0x426/0xea2 p_alloc(0) [delta=-16384] [pool=buffer] 1391925 1391925 22805299200 22805299200\| 0x58435a main+0xdf07a p_alloc(0) [delta=0] [pool=buffer] 0 2087930 0 34208645120\| 0x59b519 stream_release_buffers+0xf9/0x110 p_free(-16384) [pool=buffer] 695993 695992 11403149312 11403132928\| 0x66018f main+0x1baeaf p_alloc(0) [delta=16384] [pool=buffer] 0 1391957 0 22805823488\| 0x59b47c stream_release_buffers+0x5c/0x110 p_free(-16384) [pool=buffer] 695968 695970 11402739712 11402772480\| 0x587b85 h1_io_cb+0x9a5/0xe7c p_alloc(0) [delta=-32768] [pool=buffer] 0 1391923 0 22805266432\| 0x57f388 main+0xda0a8 p_free(-16384) [pool=buffer] 695959 695960 11402592256 11402608640\| 0x586add main+0xe17fd p_alloc(0) [delta=-16384] [pool=buffer] 0 695978 0 11402903552\| 0x59cc58 stream_free+0x178/0x9ea p_free(-16384) [pool=buffer] (...) Here it's quickly visible that all of them got properly released.	2023-10-17 17:13:56 +02:00
Christopher Faulet	b62d5689d2	BUILD: pool: Fix GCC error about potential null pointer dereference In pool_gc(), GCC 13.2.1 reports an error about a potential null potential dereference: src/pool.c: In function ‘pool_gc’: src/pool.c:807:64: error: potential null pointer dereference [-Werror=null-dereference] 807 \| entry->buckets[bucket].free_list = temp->next; \| ~~~~^~~~~~ There is no issue here because "bucket" variable cannot be greater than CONFIG_HAP_POOL_BUCKETS. But to make GCC happy, we now break the loop if it is greater or equal to CONFIG_HAP_POOL_BUCKETS.	2023-10-04 08:03:02 +02:00
Willy Tarreau	5abbae2d3d	CLEANUP: pools: simplify the pool expression when no pool was matched in dump When dumping pool information, we make a special case of the condition where the pool couldn't be identified and we consider that it was the correct one. In the code arrangements brought by commit `efc46dede` ("DEBUG: pools: inspect pools on fatal error and dump information found"), a ternary expression for testing this depends on the "if" block condition so this can be simplified and will make Coverity happy. This was reported in GH #2290.	2023-09-13 13:31:41 +02:00
Willy Tarreau	93c2ea0ec3	MEDIUM: pools: refine pool size rounding The pools sizes were rounded up a little bit too much with commit `30f931ead` ("BUG/MEDIUM: pools: fix the minimum allocation size"). The goal was in fact to make sure they were always at least large enough to store 2 list heads, and stuffing this into the alignment calculation resulted in the size being always rounded up to this size. This is problematic because it means that the appended tag at the end doesn't always catch potential overflows since more bytes than needed are allocated. Moreover, this test was later reinforced by commit `b5ba09ed5` ("BUG/MEDIUM: pools: ensure items are always large enough for the pool_cache_item"), proving that the first test was not always sufficient. This needs to be reworked to proceed correctly: - the two lists are needed when the object is in the cache, hence when we don't care about the tag, which means that the tag's size, if any, can easily cover for the missing bytes to reach that size. This is actually what was already being checked for. - the rounding should not be performed (beyond the size of a word to preserve pointer alignment) when pool tagging is enabled, otherwise we don't detect small overflows. It means that there will be less merging when proceeding like this. Tests show that we merge 93 pools into 36 without tags and 43 with tags enabled. - the rounding should not consider the extra size, since it's already done when calculating the allocated size later (i.e. don't round up twice). The difference is subtle but it's what makes sure the tag immediately follows the area instead of starting from the end. Thanks to this, now when writing one byte too many at the end of a struct stream, the error is instantly caught.	2023-09-12 18:14:05 +02:00
Willy Tarreau	61575769ac	DEBUG: pools: print the contents surrounding the expected tag location When no tag matches a known pool, we can inspect around to help figure what could have possibly overwritten memory. The contents are printed one machine word per line in hex, then using printable characters, and when they can be resolved to a pointer, either the pool's pointer name or a resolvable symbol with offset. The goal here is to help recognize what is easily identifiable in memory. For example applying the following patch to stream_free(): - pool_free(pool_head_stream, s); + pool_free(pool_head_stream, (void*)s+1); Causes the following dump to be emitted: FATAL: pool inconsistency detected in thread 1: tag mismatch on free(). caller: 0x59e968 (stream_free+0x6d8/0xa0a) item: 0x13df5c1 pool: 0x12782c0 ('stream', size 888, real 904, users 1) Tag does not match (0x4f00000000012782). Tag does not match any other pool. Contents around address 0x13df5c1+888=0x13df939: 0x13df918 [00 00 00 00 00 00 00 00] [........] 0x13df920 [00 00 00 00 00 00 00 00] [........] 0x13df928 [00 00 00 00 00 00 00 00] [........] 0x13df930 [00 00 00 00 00 00 00 00] [........] 0x13df938 [c0 82 27 01 00 00 00 00] [..'.....] [pool:stream] 0x13df940 [4f c0 59 00 00 00 00 00] [O.Y.....] [stream_new+0x4f/0xbec] 0x13df948 [49 46 49 43 41 54 45 2d] [IFICATE-] 0x13df950 [81 02 00 00 00 00 00 00] [........] 0x13df958 [df 13 00 00 00 00 00 00] [........] Other possible callers: (...) We notice that the tag references pool_head_stream with the allocation point in stream_new. Another benefit is that a caller may be figured from the tag even if the "caller" feature is not enabled, because upon a free() we always put the caller's location into the tag. This should be sufficient to debug most cases that normally require gdb.	2023-09-12 18:14:05 +02:00
Willy Tarreau	0f9a10c7f1	DEBUG: pools: also print the value of the tag when it doesn't match Sometimes the tag's value may reveal a recognizable pattern, so let's print it when it doesn't match a known pool.	2023-09-12 18:14:05 +02:00
Willy Tarreau	96c1a24224	DEBUG: pools: also print the item's pointer when crashing It's important to inspect a core or recognize some values to have the item pointer, it was not provided.	2023-09-12 18:14:05 +02:00
Willy Tarreau	efc46dede9	DEBUG: pools: inspect pools on fatal error and dump information found It's a bit frustrating sometimes to see pool checks catch a bug but not provide exploitable information without a core. Here we're adding a function "pool_inspect_item()" which is called just before aborting in pool_check_pattern() and POOL_DEBUG_CHECK_MARK() and which will display the error type, the pool's pointer and name, and will try to check if the item's tag matches the pool, and if not, will iterate over all pools to see if one would be a better candidate, then will try to figure the last known caller and possibly other likely candidates if the pool's tag is not sufficiently trusted. This typically helps better diagnose corruption in use-after-free scenarios, or freeing to a pool that differs from the one the object was allocated from, and will also indicate calling points that may help figure where an object was last released or allocated. The info is printed on stderr just before the backtrace. For example, the recent off-by-one test in the PPv2 changes would have produced the following output in vtest logs: * h1 debug\|FATAL: pool inconsistency detected in thread 1: tag mismatch on free(). * h1 debug\| caller: 0x62bb87 (conn_free+0x147/0x3c5) * h1 debug\| pool: 0x2211ec0 ('pp_tlv_256', size 304, real 320, users 1) * h1 debug\|Tag does not match. Possible origin pool(s): * h1 debug\| tag: @0x2565530 = 0x2216740 (pp_tlv_128, size 176, real 192, users 1) * h1 debug\|Recorded caller if pool 'pp_tlv_128': *** h1 debug\| @0x2565538 (+0184) = 0x62c76d (conn_recv_proxy+0x4cd/0xa24) A mismatch in the allocated/released pool is already visible, and the callers confirm it once resolved, where the allocator indeed allocates from pp_tlv_128 and conn_free() releases to pp_tlv_256: $ addr2line -spafe ./haproxy <<< $'0x62bb87\n0x62c76d' 0x000000000062bb87: conn_free at connection.c:568 0x000000000062c76d: conn_recv_proxy at connection.c:1177	2023-09-11 15:46:14 +02:00
Willy Tarreau	f6bee5a50b	DEBUG: pools: make pool_check_pattern() take a pointer to the pool This will be useful to report detailed bug traces.	2023-09-11 15:19:49 +02:00
Willy Tarreau	e92e96b00f	DEBUG: pools: pass the caller pointer to the check functions and macros In preparation for more detailed pool error reports, let's pass the caller pointers to the check functions. This will be useful to produce messages indicating where the issue happened.	2023-09-11 15:19:49 +02:00
Willy Tarreau	baf2070421	DEBUG: pools: always record the caller for uncached allocs as well When recording the caller of a pool_alloc(), we currently store it only when the object comes from the cache and never when it comes from the heap. There's no valid reason for this except that the caller's pointer was not passed to pool_alloc_nocache(), so it used to set NULL there. Let's just pass it down the chain.	2023-09-11 15:19:49 +02:00
Willy Tarreau	0074c36dd2	BUILD: pools: import plock.h to build even without thread support In 2.9-dev4, commit `544c2f2d9` ("MINOR: pools: use EBO to wait for unlock during pool_flush()") broke the thread-less build by calling pl_wait_new_long() without explicitly including plock.h which is normally included by thread.h when threads are enabled.	2023-08-26 17:28:08 +02:00
Willy Tarreau	544c2f2d9e	MINOR: pools: use EBO to wait for unlock during pool_flush() pool_flush() could become a source of contention on the pool's free list if there are many competing thread using that pool. Let's make sure we use EBO and not just a simple CPU relaxation there, to avoid disturbing them.	2023-08-17 09:09:20 +02:00
Willy Tarreau	2d18717fb8	BUILD: pools: fix build error on clang with inline vs forceinline clang is more picky than gcc regarding duplicate "inline". The functions declared with "forceinline" don't need to have "inline" since it's already in the macro.	2023-08-12 19:58:17 +02:00
Willy Tarreau	29eed99b50	MINOR: pools: make pool_evict_last_items() use pool_put_to_os_no_dec() The bucket is already known, no need to calculate it again. Let's just include the lower level functions.	2023-08-12 19:04:34 +02:00
Willy Tarreau	7bf829ace1	MAJOR: pools: move the shared pool's free_list over multiple buckets This aims at further reducing the contention on the free_list when using global pools. The free_list pointer now appears for each bucket, and both the alloc and the release code skip to a next bucket when ending on a contended entry. The default entry used for allocations and releases depend on the thread ID so that locality is preserved as much as possible under low contention. It would be nice to improve the situation to make sure that releases to the shared pools doesn't consider the first entry's pointer but only an argument that would be passed and that would correspond to the bucket in the thread's cache. This would reduce computations and make sure that the shared cache only contains items whose pointers match the same bucket. This was not yet done. One possibility could be to keep the same splitting in the local cache. With this change, an h2load test with 5 * 160 conns & 40 streams on 80 threads that was limited to 368k RPS with the shared cache jumped to 3.5M RPS for 8 buckets, 4M RPS for 16 buckets, 4.7M RPS for 32 buckets and 5.5M RPS for 64 buckets.	2023-08-12 19:04:34 +02:00

1 2 3 4

197 Commits