haproxy

mirror of http://git.haproxy.org/git/haproxy.git synced 2026-02-16 03:41:49 +02:00

Author	SHA1	Message	Date
Willy Tarreau	640556c692	BUG/MINOR: checks: correctly configure the address family and protocol Currently, mixing an IPv4 and an IPv6 address in checks happens to work by pure luck because the two protocols use the same functions at the socket level and both use IPPROTO_TCP. However, they're definitely wrong as the protocol for the check address is retrieved from the server's address. Now the protocol assigned to the connection is the same as the one the address in use belongs to (eg: the server's address or the explicit check address).	2014-05-10 01:26:37 +02:00
Willy Tarreau	1746eecc52	MINOR: checks: add a new global max-spread-checks directive This directive ensures that checks with a huge interval do not start too far apart at the beginning.	2014-04-25 10:52:25 +02:00
Willy Tarreau	a1dab55e1e	BUILD/MEDIUM: checks: get rid of sprintf() OpenBSD complains about our use of sprintf() here : src/checks.o(.text+0x44db): In function `process_chk': src/checks.c:766: warning: sprintf() is often misused, please use snprintf() This case was not really clean since the introduction of global.node BTW. Better change the API to support a size argument in the function and enforce the limit.	2014-04-14 15:52:48 +02:00
Willy Tarreau	852745d548	BUG/MAJOR: check: fix memory leak in "tcp-check connect" over SSL tcp-check must not reinitialize the SSL stack upon each check! It's done once after the config parsing and leaks memory and eats performance when done upon every check. This bug was introduced in 1.5-dev22, no backport is needed.	2014-02-11 17:55:04 +01:00
Willy Tarreau	1049b1f551	MEDIUM: connection: don't use real send() flags in snd_buf() This prevents us from passing other useful info and requires the upper levels to know these flags. Let's use a new flags category instead : CO_SFL_*. For now, only MSG_MORE has been remapped.	2014-02-06 11:37:29 +01:00
Willy Tarreau	3be293f4a3	BUG/MEDIUM: checks: immediately report a connection success When no check type is configured (so the basic connection check), we want the connection success to be immediately reported. Unfortunately, it did not happen because in this case the connection is not registered for read nor for write, and the wake_srv() callback does not handle this case where no data transfer was requested. However, having option tcp-check hides this problem because the check type follows a different setup mode, by having check->type != 0 and the connection believing it must try to send data. The effect was that without any option, checks would succeed only at the end of the check interval. So let's just add the wake-up condition. This bug appeared with the recent polling changes, no backport is needed. As a workaround, using "option tcp-check" fixes the problem.	2014-02-05 18:31:24 +01:00
Baptiste Assmann	f621bea74a	BUG/MINOR: tcpcheck connect wrong behavior A typo made first step of a tcpcheck to be a connect step. This patch prevents this behavior. The bug was introduced in 1.5-dev22 with "tcp-check connect" and only affects these directives. No backport is needed.	2014-02-04 23:50:02 +01:00
Baptiste Assmann	69e273f3fc	MEDIUM: tcp-check new feature: connect A new tcp-check rule type: connect. It allows HAProxy to test applications which stand on multiple ports or multiple applications load-balanced through the same backend.	2014-02-03 00:24:11 +01:00
Willy Tarreau	3c72872da1	CLEANUP: connection: use conn_ctrl_ready() instead of checking the flag It's easier and safer to rely on conn_ctrl_ready() everywhere than to check the flag itself. It will also simplify adding extra checks later if needed. Some useless controls for !ctrl have been removed, as the CTRL_READY flag itself guarantees ctrl is set.	2014-01-26 00:42:31 +01:00
Willy Tarreau	310987a038	MAJOR: connection: remove the CO_FL_WAIT_{RD,WR} flags These flags were used to report the readiness of the file descriptor. Now this readiness is directly checked at the file descriptor itself. This removes the need for constantly synchronizing updates between the file descriptor and the connection and ensures that all layers share the same level of information. For now, the readiness is updated in conn_{sock,data}_poll_* by directly touching the file descriptor. This must move to the lower layers instead so that these functions can disappear as well. In this state, the change works but is incomplete. It's sensible enough to avoid making it more complex. Now the sock/data updates become much simpler because they just have to enable/disable access to a file descriptor and not to care anymore about its readiness.	2014-01-26 00:42:30 +01:00
Willy Tarreau	f817e9f473	MAJOR: polling: rework the whole polling system This commit heavily changes the polling system in order to definitely fix the frequent breakage of SSL which needs to remember the last EAGAIN before deciding whether to poll or not. Now we have a state per direction for each FD, as opposed to a previous and current state previously. An FD can have up to 8 different states for each direction, each of which being the result of a 3-bit combination. These 3 bits indicate a wish to access the FD, the readiness of the FD and the subscription of the FD to the polling system. This means that it will now be possible to remember the state of a file descriptor across disable/enable sequences that generally happen during forwarding, where enabling reading on a previously disabled FD would result in forgetting the EAGAIN flag it met last time. Several new state manipulation functions have been introduced or adapted : - fd_want_{recv,send} : enable receiving/sending on the FD regardless of its state (sets the ACTIVE flag) ; - fd_stop_{recv,send} : stop receiving/sending on the FD regardless of its state (clears the ACTIVE flag) ; - fd_cant_{recv,send} : report a failure to receive/send on the FD corresponding to EAGAIN (clears the READY flag) ; - fd_may_{recv,send} : report the ability to receive/send on the FD as reported by poll() (sets the READY flag) ; Some functions are used to report the current FD status : - fd_{recv,send}_active - fd_{recv,send}_ready - fd_{recv,send}_polled Some functions were removed : - fd_ev_clr(), fd_ev_set(), fd_ev_rem(), fd_ev_wai() The POLLHUP/POLLERR flags are now reported as ready so that the I/O layers knows it can try to access the file descriptor to get this information. In order to simplify the conditions to add/remove cache entries, a new function fd_alloc_or_release_cache_entry() was created to be used from pollers while scanning for updates. The following pollers have been updated : ev_select() : done, built, tested on Linux 3.10 ev_poll() : done, built, tested on Linux 3.10 ev_epoll() : done, built, tested on Linux 3.10 & 3.13 ev_kqueue() : done, built, tested on OpenBSD 5.2	2014-01-26 00:42:30 +01:00
Willy Tarreau	4bd07de087	MEDIUM: checks: make use of chk_report_conn_err() for connection errors Checks used not to precisely report the errors that were detected at the connection layer (eg: too many SSL connections). Using chk_report_conn_err() makes this possible.	2014-01-24 16:15:04 +01:00
Willy Tarreau	46be2e5039	MEDIUM: connection: update callers of ctrl->drain() to use conn_drain() Now we can more safely rely on the connection state to decide how to drain and what to do when data are drained. Callers don't need to manipulate the file descriptor's state anymore. Note that it also removes the need for the fix `ea90063` ("BUG/MEDIUM: stream-int: fix the keep-alive idle connection handler") since conn_drain() correctly sets the polling flags.	2014-01-20 22:27:17 +01:00
Willy Tarreau	7f4bcc312d	MINOR: protocol: improve the proto->drain() API It was not possible to know if the drain() function had hit an EAGAIN, so now we change the API of this function to return : < 0 if EAGAIN was met = 0 if some data remain > 0 if a shutdown was received	2014-01-20 22:27:16 +01:00
Willy Tarreau	abf08d9365	BUG/MAJOR: connection: fix mismatch between rcv_buf's API and usage Steve Ruiz reported some reproducible crashes with HTTP health checks on a certain page returning a huge length. The traces he provided clearly showed that the recv() call was performed twice for a total size exceeding the buffer's length. Cyril Bont� tracked down the problem to be caused by the full buffer size being passed to rcv_buf() in event_srv_chk_r() instead of passing just the remaining amount of space. Indeed, this change happened during the connection rework in 1.5-dev13 with the following commit : `f150317` MAJOR: checks: completely use the connection transport layer But one of the problems is also that the comments at the top of the rcv_buf() functions suggest that the caller only has to ensure the requested size doesn't overflow the buffer's size. Also, these functions already have to care about the buffer's size to handle wrapping free space when there are pending data in the buffer. So let's change the API instead to more closely match what could be expected from these functions : - the caller asks for the maximum amount of bytes it wants to read ; This means that only the caller is responsible for enforcing the reserve if it wants to (eg: checks don't). - the rcv_buf() functions fix their computations to always consider this size as a max, and always perform validity checks based on the buffer's free space. As a result, the code is simplified and reduced, and made more robust for callers which now just have to care about whether they want the buffer to be filled or not. Since the bug was introduced in 1.5-dev13, no backport to stable versions is needed.	2014-01-15 01:09:48 +01:00
Willy Tarreau	9fe7aae6eb	MINOR: checks: use an inline function for health_adjust() This function is called twice per request, and does almost always nothing. Better use an inline version to avoid entering it when we can. About 0.5% additional performance was gained this way.	2013-12-31 23:47:37 +01:00
Willy Tarreau	e24d96393a	BUG/MEDIUM: checks: unchecked servers could not be enabled anymore Recent fix `02541e8` (BUG/MEDIUM: checks: servers must not start in slowstart mode) failed to consider one case : a server chich is not checked at all can be disabled and has to support being enabled again. So we must also enter the set_server_up() function when the checks are totally disabled. No backport is needed.	2013-12-28 21:33:36 +01:00
Willy Tarreau	9f708ab707	BUG/MINOR: checks: successful check completion must not re-enable MAINT servers If a server is switched to maintenance mode while a check is in progress, the successful completion of the check must not switch it back up. This is still a consequence of using the same function set_server_up() for every state change. Bug reported by Igor at owind. This fix should be backported to 1.4 which is affected as well.	2013-12-23 14:04:17 +01:00
Willy Tarreau	02541e8be2	BUG/MEDIUM: checks: servers must not start in slowstart mode In 1.5-dev20, commit `bb9665e` (BUG/MEDIUM: checks: ensure we can enable a server after boot) tried to fix a side effect of having both regular checks and agent checks condition the up state propagation to servers. Unfortunately it was still not fine because after this fix, servers which make use of slowstart start in this mode. We must not check the agent's health if agent checks are not enabled, and likewise, we must not check the regular check's health if they are not enabled. Reading the code, it seems like we could avoid entering this function at all if (s->state & SRV_RUNNING) is not satisfied. Let's reserve this for a later patch if needed. Thanks to Sander Klein for reporting this abnormal situation.	2013-12-16 18:08:36 +01:00
Willy Tarreau	ad38acedaa	MEDIUM: connection: centralize handling of nolinger in fd management Right now we see many places doing their own setsockopt(SO_LINGER). Better only do it just before the close() in fd_delete(). For this we add a new flag on the file descriptor, indicating if it's safe or not to linger. If not (eg: after a connect()), then the setsockopt() call is automatically performed before a close(). The flag automatically turns to safe when receiving a read0.	2013-12-16 02:23:52 +01:00
Willy Tarreau	bb9665e649	BUG/MEDIUM: checks: ensure we can enable a server after boot Since commit `58c3297` (MEDIUM: Set rise and fall of agent checks to 1), due to a bogus condition, it became impossible to re-enable a server that was disabled in the configuration if no agent was enabled. The reason is that in this case, the agent's health was zero while the condition expected it to be at least one to consider the action. Let's fix this by only considering the health of checks that are enabled.	2013-12-14 16:14:15 +01:00
Willy Tarreau	0d924cc74c	MEDIUM: checks: enable agent checks even if health checks are disabled The agent is able to retrieve some weight information from the server and will eventually be able to force the server into maintenance mode. It doesn't seem logical to have it depend on the health check being configured, as for some servers it might very well make sense to only fetch the weight from the server's load regardless of the health. So let's stop disabling the agent checks when health checks are disabled.	2013-12-14 16:02:20 +01:00
Willy Tarreau	3343432fcd	MINOR: checks: add a flag to indicate what check is an agent Currently to know if a check is an agent, we compare its pointer to its servers' agent pointer. Better have a flag in its state to indicate this.	2013-12-14 16:02:20 +01:00
Willy Tarreau	d8514a2a05	MINOR: checks: create the agent tasks even when no check is configured Till now, a configuration required at least one health check in the whole config file to create the agent tasks. Now we start them even if no health check is enabled.	2013-12-14 16:02:20 +01:00
Willy Tarreau	33a08db932	MINOR: checks: add a PAUSED state for the checks Health checks can now be paused. This is the status they get when the server is put into maintenance mode, which is more logical than relying on the server's state at some places. It will be needed to allow agent checks to run when health checks are disabled (currently not possible).	2013-12-14 16:02:20 +01:00
Willy Tarreau	15f3910214	MINOR: checks: fix agent check interval computation start_checks() only used to consider the health checks intervals to compute the start interval, so if an agent had a faster check than all health checks, it would be significantly delayed.	2013-12-14 16:02:19 +01:00
Willy Tarreau	ff5ae35b9f	MINOR: checks: use check->state instead of srv->state & SRV_CHECKED Having the check state partially stored in the server doesn't help. Some functions such as srv_getinter() rely on the server being checked to decide what check frequency to use, instead of relying on the check being configured. So let's get rid of SRV_CHECKED and SRV_AGENT_CHECKED and only use the check's states instead.	2013-12-14 16:02:19 +01:00
Willy Tarreau	2e10f5a759	MINOR: checks: replace state DISABLED with CONFIGURED and ENABLED At the moment, health checks and agent checks are tied : no agent check is emitted if no health check is enabled. Other parameters are considered in the condition for letting checks run. It will help us selectively enable checks (agent and regular checks) to be know whether they're enabled/disabled and configured or not. Now we can already emit an error when trying to enable an unconfigured agent.	2013-12-14 16:02:19 +01:00
Willy Tarreau	2c115e5047	MINOR: checks: rename the state flags The flag CHK_STATE_RUNNING is misleading as one may believe it means the state is enabled (just like SRV_RUNNING). Let's rename these two flags CHK_ST_INPROGRESS and CHK_ST_DISABLED.	2013-12-14 16:02:19 +01:00
Willy Tarreau	6aaa1b87cf	MINOR: checks: use an enum instead of flags to report a check result We used to have up to 4 sets of flags which were almost all exclusive to report a check result. And the names were inherited from the old server states, adding to the confusion. Let's replace that with an enum handling only the possible combinations : SRV_CHK_UNKNOWN => CHK_RES_UNKNOWN SRV_CHK_FAILED => CHK_RES_FAILED SRV_CHK_PASSED => CHK_RES_PASSED SRV_CHK_PASSED \| SRV_CHK_DISABLE => CHK_RES_CONDPASS	2013-12-14 16:02:19 +01:00
Willy Tarreau	1a53a3af13	MINOR: checks: improve handling of the servers tracking chain Server tracking uses the same "tracknext" list for servers tracking another one and for the servers being tracked. This caused an issue which was fixed by commit `f39c71c` ([CRITICAL] fix server state tracking: it was O(n!) instead of O(n)), consisting in ensuring that a server is being checked before walking down the list, so that we don't propagate the up/down information via servers being part of the track chain. But the root cause is the fact that all servers share the same list. The correct solution consists in having a list head for the tracked servers and a list of next tracking servers. This simplifies the propagation logic, especially for the case where status changes might be passed to individual servers via the CLI.	2013-12-14 16:02:18 +01:00
Willy Tarreau	9809b78ed9	BUG/MEDIUM: checks: agent doesn't get the response if server does not closes The agent refrains from reading the server's response until the server closes, but if the server waits for the client to close, the response is never read. Let's try to fetch a whole line before deciding to wait more.	2013-12-11 21:43:09 +01:00
Willy Tarreau	57cd3e46b9	MEDIUM: connection: merge the send_proxy and local_send_proxy calls We used to have two very similar functions for sending a PROXY protocol line header. The reason is that the default one relies on the stream interface to retrieve the other end's address, while the "local" one performs a local address lookup and sends that instead (used by health checks). Now that the send_proxy_ofs is stored in the connection and not the stream interface, we can make the local_send_proxy rely on it and support partial sends. This also simplifies the code by removing the local_send_proxy function, making health checks use send_proxy_ofs, resulting in the removal of the CO_FL_LOCAL_SPROXY flag, and the associated test in the connection handler. The other flag, CO_FL_SI_SEND_PROXY was renamed without the "SI" part so that it is clear that it is not dedicated anymore to a usage with a stream interface.	2013-12-09 15:40:23 +01:00
Willy Tarreau	7abddb5c67	MINOR: connection: replace conn_assign with conn_attach We don't want to assign the control nor transport layers anymore at the same time as the data layer, because it prevents one from keeping existing settings when reattaching a connection to an existing stream interface. Let's have conn_attach() replace conn_assign() for this purpose. Thus, conn_prepare() + conn_attach() do exactly the same as the previous conn_assign().	2013-12-09 15:40:23 +01:00
Willy Tarreau	910c6aa5b7	MINOR: connection: reintroduce conn_prepare to set the protocol and transport Now that we can assign conn->xprt regardless of the initialization state, we can reintroduce conn_prepare() to set only the protocol, the transport layer and initialize the transport layer's state.	2013-12-09 15:40:23 +01:00
Willy Tarreau	f79c8171b2	MAJOR: connection: add two new flags to indicate readiness of control/transport Currently the control and transport layers of a connection are supposed to be initialized when their respective pointers are not NULL. This will not work anymore when we plan to reuse connections, because there is an asymmetry between the accept() side and the connect() side : - on accept() side, the fd is set first, then the ctrl layer then the transport layer ; upon error, they must be undone in the reverse order, then the FD must be closed. The FD must not be deleted if the control layer was not yet initialized ; - on the connect() side, the fd is set last and there is no reliable way to know if it has been initialized or not. In practice it's initialized to -1 first but this is hackish and supposes that local FDs only will be used forever. Also, there are even less solutions for keeping trace of the transport layer's state. Also it is possible to support delayed close() when something (eg: logs) tracks some information requiring the transport and/or control layers, making it even more difficult to clean them. So the proposed solution is to add two flags to the connection : - CO_FL_CTRL_READY is set when the control layer is initialized (fd_insert) and cleared after it's released (fd_delete). - CO_FL_XPRT_READY is set when the control layer is initialized (xprt->init) and cleared after it's released (xprt->close). The functions have been adapted to rely on this and not on the pointers anymore. conn_xprt_close() was unused and dangerous : it did not close the control layer (eg: the socket itself) but still marks the transport layer as closed, preventing any future call to conn_full_close() from finishing the job. The problem comes from conn_full_close() in fact. It needs to close the xprt and ctrl layers independantly. After that we're still having an issue : we don't know based on ->ctrl alone whether the fd was registered or not. For this we use the two new flags CO_FL_XPRT_READY and CO_FL_CTRL_READY. We now rely on this and not on conn->xprt nor conn->ctrl anymore to decide what remains to be done on the connection. In order not to miss some flag assignments, we introduce conn_ctrl_init() to initialize the control layer, register the fd using fd_insert() and set the flag, and conn_ctrl_close() which unregisters the fd and removes the flag, but only if the transport layer was closed. Similarly, at the transport layer, conn_xprt_init() calls ->init and sets the flag, while conn_xprt_close() checks the flag, calls ->close and clears the flag, regardless xprt_ctx or xprt_st. This also ensures that the ->init and the ->close functions are called only once each and in the correct order. Note that conn_xprt_close() does nothing if the transport layer is still tracked. conn_full_close() now simply calls conn_xprt_close() then conn_full_close() in turn, which do nothing if CO_FL_XPRT_TRACKED is set. In order to handle the error path, we also provide conn_force_close() which ignores CO_FL_XPRT_TRACKED and closes the transport and the control layers in turns. All relevant instances of fd_delete() have been replaced with conn_force_close(). Now we always know what state the connection is in and we can expect to split its initialization.	2013-12-09 15:40:23 +01:00
Willy Tarreau	37213433a8	MEDIUM: connection: replace conn_prepare with conn_assign Everywhere conn_prepare() is used, the call to conn_init() has already been done. We can now safely replace all instances of conn_prepare() with conn_assign() which does not reset the transport layer, and remove conn_prepare().	2013-12-09 15:40:23 +01:00
Willy Tarreau	4bdae8aa82	MINOR: checks: call conn_init() to properly initialize the connection. This ensures we don't forget to set some fields such as obj_type.	2013-12-09 15:40:23 +01:00
Willy Tarreau	fbe0edf057	MEDIUM: checks: make tcp-check perform multiple send() at once Now instead of seeing many send() calls from multiple "tcp-check send" rules, we fill the output buffer and try to send all only when we're not in a send state or when the output buffer is too small for sending the next message. This results in a lot less syscalls and avoids filling the network with many small packets. It will also improve the behaviour of some bogus servers which expect a complete request in the first packet.	2013-12-06 17:13:50 +01:00
Willy Tarreau	98aec9ff47	BUG/MINOR: checks: tcp-check actions are enums, not flags In recent commit `5ecb77f` (MEDIUM: checks: add send/expect tcp based check), bitfields were mistakenly used at some places for the actions. Fortunately, the only two actions right now are 1 and 2 so they don't share any bit in common and the bug has no impact. No backport is needed.	2013-12-06 16:16:41 +01:00
Willy Tarreau	abca5b6ab5	BUG/MEDIUM: check: tcp-check might miss some outgoing data when socket buffers are full If a "tcp-check send" experiences an EAGAIN on a send() call, it will nevertheless go to next rule, and will not try to send again if the next rule is an expect. Change this so that we always try to send whatever remains in the buffer before doing anything else.	2013-12-06 14:26:55 +01:00
Willy Tarreau	a970c283cc	BUG/MEDIUM: checks: tcp-check: do not poll when there's nothing to send A config with just a "tcp-check expect string XXX" loops at 100% CPU because the connect() wakes the function and there's nothing to send, but it does not disable the polling. Rearrange the polling setup to fix this. This was just caused by latest commit, no backport is needed.	2013-12-06 12:47:19 +01:00
Baptiste Assmann	5ecb77f4c7	MEDIUM: checks: add send/expect tcp based check This is a generic health check which can be used to match a banner or send a request and analyse a server response. It works in a send/expect ways and many exchange can be done between HAProxy and a server to decide the server status, making HAProxy able to speak the server's protocol. It can send arbitrary regular or binary strings and match content as a regular or binary string or a regex. Signed-off-by: Baptiste Assmann <bedis9@gmail.com>	2013-12-06 11:50:47 +01:00
Willy Tarreau	2d351b68c2	OPTIM: checks: avoid setting SO_LINGER twice We happened to preform this call twice on some checks, once in the recv event handler, and another one in the main function. Remove the one from the event handler which does not make any more sense there.	2013-12-05 02:36:25 +01:00
Willy Tarreau	06559ac1fb	OPTIM: checks: don't poll on recv when using plain TCP connects When pure TCP checks are used, we see a useless call to recvfrom() in strace resulting from an inconditional poll on recv after the connect() succeeds. Let's remove this one and properly report connection success in the write events.	2013-12-05 02:23:48 +01:00
Willy Tarreau	25e2ab53db	MEDIUM: checks: centralize error reporting Error reporting in health checks is unreliable as the number of recent patch shows. The main reason is that the code required to detect the exact situation where the error occurred is not simple, and the errors have to be handled closer to where they occur in order to be accurate (rely on getsockopt(SO_ERROR) and errno). To solve this, we introduce chk_report_conn_err(). It does its best to consider a possible errno passed in argument, a possible timeout passed as well, then it completes this with getsockopt() if needed, and takes into account the current status of the connection. The result is that by simply calling this function with errno when it's known, we can emit accurate log messages from every location. We can now see a messages like "Connection error during SSL handshake (No route to host)" which were not previously possible.	2013-12-05 02:23:48 +01:00
Willy Tarreau	20a1834219	BUG/MINOR: checks: do not trust errno in write event before any syscall The only case where errno is supposed to be valid is when the connection has just got the CO_FL_ERROR flag and errno is not zero, because it will have been set by the same function that has set the flag. For all other situations, we need to check the socket using getsockopt(), but only do it once, since it clears the pending error code. For this reason, we assign the error code to errno in order not to lose it. The same call is made at the entry of event_srv_chk_r(), event_srv_chk_w(), and wake_srv_chk() so that we get a chance to collect errors reported by the poller or by failed syscalls. Note that this fix relies on the 4 previous patches, so backporters must be very careful.	2013-12-05 02:23:48 +01:00
Willy Tarreau	ea292ae649	BUG/MINOR: checks: don't consider errno and use conn->err_code The last fix on checks (`02b0f58`: BUG/MEDIUM: checks: fix a long-standing issue with reporting connection errors) tried to isolate error codes retrieved from the socket in order to report appropriate messages. The only thing is that we must not pre-initialize err to errno since we're not in I/O context anymore and errno will be the one of the last syscall (whatever it was). However we can complete the message with more info from the transport layer (eg: SSL can inform us we were in a handshake). Also add a catch-all case for CO_FL_ERROR when the connection was established. No check currently seem to leave this case open, but better catch it because it's hard to find all possible cases. Error handling in checks is complex because some stuff must be done in the central task (mandatory at least for timeouts) and other stuff is done closer to the data. Since checks have their own buffers now, we could move everything to the main task and only keep the low-level I/O for sending/retrieving data to/from this buffer. It would also avoid sending logs from the I/O context!	2013-12-03 22:54:39 +01:00
Willy Tarreau	02b0f58c43	BUG/MEDIUM: checks: fix a long-standing issue with reporting connection errors In 1.5-dev14 we fixed a bug induced by the new connection system which caused handshake failures not to be reported in health checks. It was done with commit `6c560da` (BUG/MEDIUM: checks: report handshake failures). This fix caused another issue which is that every check getting a TCP RST after a valid response was flagged as error. This was fixed using commit `c5c61fc` (BUG/MEDIUM: checks: ignore late resets after valid responses). But because of this, we completely miss the status report. These two fixes only set the check result as failed and did not call set_server_check_status() to pass the information to upper layers. The impact is that some failed checks are reported as INI or are simply not updated if they happen fast enough (eg: TCP RST in response to connect() without data in a pure TCP check). So the server appears down but the check status says "L4OK". After commit `6c560da`, the handshake failures have been correctly dealt with and every error causes process_chk() to be called with the appropriate information still present on the socket. So let's get the error code in process_chk() instead and stop mangling it in wake_srv_chk(). Now both L4 and L6 checks are correctly reported. This bug was first introduced in 1.5-dev12 so no backport is needed.	2013-12-03 15:56:17 +01:00
Willy Tarreau	81cf08c5cd	BUG/MAJOR: check: fix haproxy crash during soft-stop/soft-start This is the continuation of previous fix `bc16cd8` "BUG/MAJOR: fix haproxy crash when using server tracking instead of checks", the soft-stop/start states were not addressed by this fix.	2013-11-28 11:52:11 +01:00

1 2 3 4 5

244 Commits