haproxy

mirror of http://git.haproxy.org/git/haproxy.git synced 2026-02-14 21:29:17 +02:00

Author	SHA1	Message	Date
Willy Tarreau	6f97b4ef33	BUG/MEDIUM: leastconn: fix rare possibility of divide by zero An optimization was brought in commit `5064ab6a9` ("OPTIM: lb-leastconn: do not unlink the server if it did not change") to avoid locking the server just to discover it did not move. However a mistake was made because the operation involves a divide with a value that is read outside of its usual lock, which makes it possible to be zero at the exact moment we watch it if another thread takes the server down under the lbprm lock, resulting in a divide by zero. Therefore we must check that the value is not null there. This must be backported to 2.4.	2021-09-22 07:24:02 +02:00
Willy Tarreau	a05704582c	MINOR: server: replace the pendconns-related stuff with a struct queue Just like for proxies, all three elements (pendconns, nbpend, queue_idx) were moved to struct queue.	2021-06-22 18:43:14 +02:00
Willy Tarreau	5941ef0a6c	MINOR: lb/api: remove the locked argument from take_conn/drop_conn This essentially reverts commit 2b4370078 ("MINOR: lb/api: let callers of take_conn/drop_conn tell if they have the lock") that was merged during 2.4 before the various locks could be eliminated at the lower layers. Passing that information complicates the cleanup of the queuing code and it's become useless.	2021-06-22 18:43:12 +02:00
Willy Tarreau	63b3ae7ca3	CLEANUP: backend: fix incorrect comments on locking conditions for lb functions The leastconn and roundrobin functions mention that the server's lock must be held while this is not true at all and it is not used either. The "first" algo doesn't mention anything about the need for locking, so let's mention that it uses the lbprm lock.	2021-06-04 15:40:50 +02:00
Willy Tarreau	5064ab6a98	OPTIM: lb-leastconn: do not unlink the server if it did not change Due to the two-phase server reservation, there are 3 calls to fwlc_srv_reposition() per request, one during assign_server() to reserve the slot, one in connect_server() to commit it, and one in process_stream() to release it. However only one of the first two will change the key, so it's needlessly costly to take the lock, remove a server and insert it again at the same place when we can already figure we ought not to even have taken the lock. Furthermore, even when the server needs to move, there can be quite some contention on the lbprm lock forcing the thread to wait. During this time the served and nbpend server values might have changed, just like the lb_node.key itself. Thus we measure the values again under the lock before updating the tree. Measurements have shown that under contention with 16 servers and 16 threads, 50% of the updates can be avoided there. This patch makes the function compute the new key and compare it to the current one before deciding to move the entry (and does it again under the lock forthe second test). This removes between 40 and 50% of the tree updates depending on the thread contention and the number of servers. The performance gain due to this (on 16 threads) was: 16 servers: 415 krps -> 440 krps (6%, contention on lbprm) 4 servers: 554 krps -> 714 krps (+29%, little contention) One point worth thinking about is that it's not logic to update the tree 2-3 times per request while it's only read once. half to 2/3 of these updates are not needed. An experiment consisting in subscribing the server to a list and letting the readers reinsert them on the fly showed further degradation instead of an improvement. A better approach would probably consist in avoinding writes to shared cache lines by having the leastconn nodes distinct from the servers, with one node per value, and having them hold an mt-list with all the servers having that number of connections. The connection count tree would then be read-mostly instead of facing heavy writes, and most write operations would be performed on 1-3 list heads which are way cheaper to migrate than a tree node, and do not require updating the last two updated neighbors' cache lines.	2021-02-18 10:06:45 +01:00
Willy Tarreau	85b2fb0358	OPTIM: lb-leastconn: do not take the server lock on take_conn/drop_conn The operations are only an insert and a delete into the LB tree, which doesn't require the server's lock at all as the lbprm lock is already held. Let's drop it. Just for the sake of cleanness, given that the served and nbpend values used to be atomically updated, we'll use an atomic load to read them.	2021-02-18 10:06:45 +01:00
Willy Tarreau	59b0fecfd9	MINOR: lb/api: let callers of take_conn/drop_conn tell if they have the lock The two algos defining these functions (first and leastconn) do not need the server's lock. However it's already present in pendconn_process_next_strm() so the API must be updated so that the functions may take it if needed and that the callers indicate whether they already own it. As such, the call places (backend.c and stream.c) now do not take it anymore, queue.c was unchanged since it's already held, and both "first" and "leastconn" were updated to take it if not already held. A quick test on the "first" algo showed a jump from 432 to 565k rps by just dropping the lock in stream.c!	2021-02-18 10:06:45 +01:00
Christopher Faulet	cb33d3ac7f	BUG/MEDIUM: lb-leastconn: Reposition a server using the right eweight Depending on the context, the current eweight or the next one must be used to reposition a server in the tree. When the server state is updated, for instance its weight, the next eweight must be used because it is not yet committed. However, when the server is used, on normal conditions, the current eweight must be used. In fact, it is only a bug on the 1.8. On newer versions, the changes on a server are performed synchronously. But it is safer to rely on the right eweight value to avoid any futur bugs. On the 1.8, it is important to do so, because the server state is updated and committed inside the rendez-vous point. Thus, the next server state may be unsync with the current state for a short time, waiting all threads join the rendez-vous point. It is especially a problem if the next eweight is set to 0. Because otherwise, it must not be used to reposition the server in the tree, leading to a divide by 0. This patch must be backported as far as 1.8.	2020-12-14 09:52:34 +01:00
Willy Tarreau	8ae8c48eb0	MEDIUM: fwlc: re-enable per-server queuing up to maxqueue Leastconn has the nice propery of being able to sort servers by their current usage. It's really a shame to force all requests into the backend queue when the algo would be able to also consider their current queue. In order not to change existing behavior but extend it, this patch allows leastconn to elect servers which are already full if they have an explicitly configured maxqueue setting above zero and their queue hasn't reached that threshold. This will significantly reduce the pressure in the backend queue when queuing a lot with lots of servers. A test on 8 threads with 100 servers configured with maxconn 1 jumped from 165krps to 330krps with maxqueue 15 with this patch. This partially undoes commit `82cd5c13a` ("OPTIM: backend: skip LB when we know the backend is full") but allows to scale much better even by setting a single-digit maxqueue value. Some better heuristics could be used to maintain the behavior of the bypass in the patch above, consisting in keeping it if it's known that there is no server with a configured maxqueue in the farm (or in the backend).	2020-10-22 18:30:25 +02:00
Willy Tarreau	8c855f6cff	MINOR: leastconn: take the queue length into account when queuing servers When servers are queued into the leastconn tree, it's important to also consider their queue length. There could be some servers with lots of queued requests that we don't want to hammer with extra connections. In order not to add extra stress to the LB algorithm, we don't update the value when adding to the queue, only when updating the connection count (i.e. picking from the queue or releasing a connection). This will be sufficient to significantly improve the fairness in such situations.	2020-10-22 18:30:18 +02:00
Willy Tarreau	58bc9c1ced	MINOR: lb/leastconn: only take a read lock in fwlc_get_next_server() This function doesn't change the tree, it only looks for the first usable server, so let's do that under a read lock to limit the situations like the ones described in issue #881 where finding a usable server when dealing with lots of saturated ones can be expensive. At least threads will now be able to look up in parallel. It's interesting to note that s->served is not incremented during the server choice, nor is the server repositionned. So right now already, nothing prevents multiple threads from picking the same server. This will not cause a significant imbalance anyway given that the server will automatically be repositionned at the right place, but this might be something to improve in the future if it doesn't come with too high a cost. It also looks like the way a server's weight is updated could be revisited so that the write lock gets tighter at the expense of a short part of inconsistency between weights and servers still present in the tree.	2020-10-17 19:37:40 +02:00
Willy Tarreau	cd10def825	MINOR: backend: replace the lbprm lock with an rwlock It was previously a spinlock, and it happens that a number of LB algos only lock it for lookups, without performing any modification. Let's first turn it to an rwlock and w-lock it everywhere. This is strictly identical. It was carefully checked that every HA_SPIN_LOCK() was turned to HA_RWLOCK_WRLOCK() and that HA_SPIN_UNLOCK() was turned to HA_RWLOCK_WRUNLOCK() on this lock. _INIT and _DESTROY were updated too.	2020-10-17 18:51:41 +02:00
Willy Tarreau	b2551057af	CLEANUP: include: tree-wide alphabetical sort of include files This patch fixes all the leftovers from the include cleanup campaign. There were not that many (~400 entries in ~150 files) but it was definitely worth doing it as it revealed a few duplicates.	2020-06-11 10:18:59 +02:00
Willy Tarreau	1e56f92693	REORG: include: move server.h to haproxy/server{,-t}.h extern struct dict server_name_dict was moved from the type file to the main file. A handful of inlined functions were moved at the bottom of the file. Call places were updated to use server-t.h when relevant, or to simply drop the entry when not needed.	2020-06-11 10:18:58 +02:00
Willy Tarreau	a55c45470f	REORG: include: move queue.h to haproxy/queue{,-t}.h Nothing outstanding here. A number of call places were not justified and removed.	2020-06-11 10:18:58 +02:00
Willy Tarreau	4980160ecc	REORG: include: move backend.h to haproxy/backend{,-t}.h The files remained mostly unchanged since they were OK. However, half of the users didn't need to include them, and about as many actually needed to have it and used to find functions like srv_currently_usable() through a long chain that broke when moving the file.	2020-06-11 10:18:58 +02:00
Willy Tarreau	f268ee8795	REORG: include: split global.h into haproxy/global{,-t}.h global.h was one of the messiest files, it has accumulated tons of implicit dependencies and declares many globals that make almost all other file include it. It managed to silence a dependency loop between server.h and proxy.h by being well placed to pre-define the required structs, forcing struct proxy and struct server to be forward-declared in a significant number of files. It was split in to, one which is the global struct definition and the few macros and flags, and the rest containing the functions prototypes. The UNIX_MAX_PATH definition was moved to compat.h.	2020-06-11 10:18:58 +02:00
Willy Tarreau	58017eef3f	REORG: include: move the BUG_ON() code to haproxy/bug.h This one used to be stored into debug.h but the debug tools got larger and require a lot of other includes, which can't use BUG_ON() anymore because of this. It does not make sense and instead this macro should be placed into the lower includes and given its omnipresence, the best solution is to create a new bug.h with the few surrounding macros needed to trigger bugs and place assertions anywhere. Another benefit is that it won't be required to add include <debug.h> anymore to use BUG_ON, it will automatically be covered by api.h. No less than 32 occurrences were dropped. The FSM_PRINTF macro was dropped since not used at all anymore (probably since 1.6 or so).	2020-06-11 10:18:56 +02:00
Willy Tarreau	4c7e4b7738	REORG: include: update all files to use haproxy/api.h or api-t.h if needed All files that were including one of the following include files have been updated to only include haproxy/api.h or haproxy/api-t.h once instead: - common/config.h - common/compat.h - common/compiler.h - common/defaults.h - common/initcall.h - common/tools.h The choice is simple: if the file only requires type definitions, it includes api-t.h, otherwise it includes the full api.h. In addition, in these files, explicit includes for inttypes.h and limits.h were dropped since these are now covered by api.h and api-t.h. No other change was performed, given that this patch is large and affects 201 files. At least one (tools.h) was already freestanding and didn't get the new one added.	2020-06-11 10:18:42 +02:00
Willy Tarreau	8d2b777fe3	REORG: ebtree: move the include files from ebtree to include/import/ This is where other imported components are located. All files which used to directly include ebtree were touched to update their include path so that "import/" is now prefixed before the ebtree-related files. The ebtree.h file was slightly adjusted to read compiler.h from the common/ subdirectory (this is the only change). A build issue was encountered when eb32sctree.h is loaded before eb32tree.h because only the former checks for the latter before defining type u32. This was addressed by adding the reverse ifdef in eb32tree.h. No further cleanup was done yet in order to keep changes minimal.	2020-06-11 09:31:11 +02:00
Willy Tarreau	ed5ac9c786	BUG/MINOR: lb/leastconn: ignore the server weights for empty servers As discussed in issue #178, the change brought around 1.9-dev11 by commit `1eb6c55808` ("MINOR: lb: make the leastconn algorithm more accurate") causes some harm in the situation it tried to improve. By always applying the server's weight even for no connection, we end up always picking the same servers for the first connections, so under a low load, if servers only have either 0 or 1 connections, in practice the same servers will always be picked. This patch partially restores the original behaviour but still keeping the spirit of the aforementioned patch. Now what is done is that servers with no connections will always be picked first, regardless of their weight, so they will effectively follow round-robin. Only servers with one connection or more will see an accurate weight applied. This patch was developed and tested by @malsumis and @jaroslawr who reported the initial issue. It should be backported to 2.0 and 1.9.	2019-09-06 17:13:44 +02:00
Christopher Faulet	1ae2a88781	BUG/MEDIUM: lb_fwlc: Don't test the server's lb_tree from outside the lock In the function fwlc_srv_reposition(), the server's lb_tree is tested from outside the lock. So it is possible to remove it after the test and then call eb32_insert() in fwlc_queue_srv() with a NULL root pointer, which is invalid. Moving the test in the scope of the lock fixes the bug. This issue was reported on Github, issue #126. This patch must be backported to 2.0, 1.9 and 1.8.	2019-06-19 13:55:57 +02:00
Willy Tarreau	1eb6c55808	MINOR: lb: make the leastconn algorithm more accurate The leastconn algorithm queues available servers based on their weighted current load. But this results in an inaccurate load balancing when weights differ and the load is very low, because what matters is not the load before picking the server but the load resulting from picking the server. At the very least, it must be granted that servers with the highest weight are always picked first when no server has any connection. This patch addresses this by simply adding one to the current connections count when queuing the server, since this is the load the server will have once picked. This finally allows to bridge the gap that existed between the "leastconn" and the "first" algorithms.	2018-12-14 08:33:28 +01:00
Willy Tarreau	1b87748ff5	BUG/MEDIUM: lb/threads: always properly lock LB algorithms on maintenance operations Since commit `3ff577e` ("MAJOR: server: make server state changes synchronous again"), srv_update_status() calls the various maintenance operations of the LB algorithms (->set_server_up, ->set_server_down, ->update_server_weight()). These ones are called with a single thread guaranteed by the rendez-vous point, so the fact that they're lacking some locks has no effect. However we'll need to remove the rendez-vous point so we have to take care of properly locking all the LB algos. The comments have been properly updated on the various functions to mention their locking expectations. All these functions are called with the server lock held, and all of them now support concurrent calls by using the lbprm's lock. This fix doesn't need to be backported at the moment, though if any check-specific issue surfaced in 1.8, it could make sense to reuse it.	2018-08-21 19:44:53 +02:00
Christopher Faulet	2a944ee16b	BUILD: threads: Rename SPIN/RWLOCK macros using HA_ prefix This remove any name conflicts, especially on Solaris.	2017-11-07 11:10:24 +01:00
Christopher Faulet	5b51755aef	MEDIUM: threads/lb: Make LB algorithms (lb_*.c) thread-safe A lock for LB parameters has been added inside the proxy structure and atomic operations have been used to update server variables releated to lb. The only significant change is about lb_map. Because the servers status are updated in the sync-point, we can call recalc_server_map function synchronously in map_set_server_status_up/down function.	2017-10-31 13:58:31 +01:00
Emeric Brun	52a91d3d48	MEDIUM: check: server states and weight propagation re-work The server state and weight was reworked to handle "pending" values updated by checks/CLI/LUA/agent. These values are commited to be propagated to the LB stack. In further dev related to multi-thread, the commit will be handled into a sync point. Pending values are named using the prefix 'next_' Current values used by the LB stack are named 'cur_'	2017-09-05 15:23:16 +02:00
Willy Tarreau	c93cd16b6c	REORG/MEDIUM: server: split server state and flags in two different variables Till now, the server's state and flags were all saved as a single bit field. It causes some difficulties because we'd like to have an enum for the state and separate flags. This commit starts by splitting them in two distinct fields. The first one is srv->state (with its counter-part srv->prev_state) which are now enums, but which still contain bits (SRV_STF_*). The flags now lie in their own field (srv->flags). The function srv_is_usable() was updated to use the enum as input, since it already used to deal only with the state. Note that currently, the maintenance mode is still in the state for simplicity, but it must move as well.	2014-05-22 11:27:00 +02:00
Willy Tarreau	87eb1d6994	MINOR: server: create srv_was_usable() from srv_is_usable() and use a pointer We used to call srv_is_usable() with either the current state and weights or the previous ones. This causes trouble for future changes, so let's first split it in two variants : - srv_is_usable(srv) considers the current status - srv_was_usable(srv) considers the previous status	2014-05-13 22:34:55 +02:00
Willy Tarreau	c5150dafd8	MINOR: server: use functions to detect state changes and to update them Detecting that a server's status has changed is a bit messy, as well as it is to commit the status changes. We'll have to add new conditions soon and we'd better avoid to multiply the number of touched locations with the high risk of forgetting them. This commit introduces : - srv_lb_status_changed() to report if the status changed from the previously committed one ; - svr_lb_commit_status() to commit the current status The function is now used by all load-balancing algorithms.	2014-05-13 22:18:22 +02:00
Willy Tarreau	004e045f31	BUG/MAJOR: server: weight calculation fails for map-based algorithms A crash was reported by Igor at owind when changing a server's weight on the CLI. Lukas Tribus could reproduce a related bug where setting a server's weight would result in the new weight being multiplied by the initial one. The two bugs are the same. The incorrect weight calculation results in the total farm weight being larger than what was initially allocated, causing the map index to be out of bounds on some hashes. It's easy to reproduce using "balance url_param" with a variable param, or with "balance static-rr". It appears that the calculation is made at many places and is not always right and not always wrong the same way. Thus, this patch introduces a new function "server_recalc_eweight()" which is dedicated to this task of computing ->eweight from many other elements including uweight and current time (for slowstart), and all users now switch to use this function. The patch is a bit large but the code was not trivially fixable in a way that could guarantee this situation would not occur anymore. The fix is much more readable and has been verified to work with all algorithms, with both consistent and map-based hashes, and even with static-rr. Slowstart was tested as well, just like enable/disable server. The same bug is very likely present in 1.4 as well, so the patch will probably need to be backported eventhough it will not apply as-is. Thanks to Lukas and Igor for the information they provided to reproduce it.	2013-11-21 15:09:02 +01:00
Willy Tarreau	45cb4fb640	[MEDIUM] build: switch ebtree users to use new ebtree version All files referencing the previous ebtree code were changed to point to the new one in the ebtree directory. A makefile variable (EBTREE_DIR) is also available to use files from another directory. The ability to build the libebtree library temporarily remains disabled because it can have an impact on some existing toolchains and does not appear worth it in the medium term if we add support for multi-criteria stickiness for instance.	2009-10-26 21:10:04 +01:00
Willy Tarreau	f89c1873f8	[CLEANUP] backend: move LB algos to individual files It was becoming painful to have all the LB algos in backend.c. Let's move them to their own files. A few hashing functions still need be broken in two parts, one for the contents and one for the map position.	2009-10-01 11:19:37 +02:00

33 Commits