Commit Graph

352 Commits

Author SHA1 Message Date
Willy Tarreau
b5259bf44f MINOR: checks: make chk_report_conn_err() take a check, not a connection
Amazingly, this function takes a connection to report an error and is used
by process checks, placing a hard dependency between the connection and the
check preventing the mux from being completely implemented. Let's first get
rid of this.
2017-10-04 14:47:29 +02:00
Willy Tarreau
c09572fd8b BUG/MEDIUM: tcp-check: don't call tcpcheck_main() from the I/O handlers!
This function can destroy a socket and create a new one, resulting in a
change of FD on the connection between recv() and send() for example,
which is absolutely not permitted, and can result in various funny games
like polling not being properly updated (or with the flags from a previous
fd) etc.

Let's only call this from the wake() callback which is more tolerant.
Ideally the operations should be made even more reliable by returning
a specific value to indicate that the connection was released and that
another one was created. But this is hasardous for stable releases as
it may reveal other issues.

This fix should be backported to 1.7 and 1.6.
2017-10-04 13:41:20 +02:00
Willy Tarreau
82feaaf042 BUG/MINOR: tcp-check: don't quit with pending data in the send buffer
In the rare case where the "tcp-check send" directive is the last one in
the list, it leaves the loop without sending the data. Fortunately, the
polling is still enabled on output, resulting in the connection handler
calling back to send what remains, but this is ugly and not very reliable.

This may be backported to 1.7 and 1.6.
2017-10-04 13:41:20 +02:00
Willy Tarreau
a3782e7594 BUG/MEDIUM: tcp-check: properly indicate polling state before performing I/O
While porting the connection to use the mux layer, it appeared that
tcp-checks wouldn't receive anymore because the polling is not enabled
before attempting to call xprt->rcv_buf() nor xprt->snd_buf(), and it
is illegal to call these functions with polling disabled as they
directly manipulate the FD state, resulting in an inconsistency where
the FD is enabled and the connection's polling flags disabled.

Till now it happened to work only because when recv() fails on EAGAIN
it calls fd_cant_recv() which enables polling while signaling the
failure, so that next time the message is received. But the connection's
polling is never enabled, and any tiny change resulting in a call to
conn_data_update_polling() immediately disables reading again.

It's likely that this problem already happens on some corner cases
such as multi-packet responses. It definitely breaks as soon as the
response buffer is full but we don't support consuming more than one
response buffer.

This fix should be backported to 1.7 and 1.6. In order to check for the
proper behaviour, this tcp-check must work and clearly show an SSH
banner in recvfrom() as observed under strace, otherwise it's broken :

   tcp-check connect port 22
   tcp-check expect rstring SSH
   tcp-check send blah
2017-10-04 13:41:17 +02:00
Willy Tarreau
3cad394520 CLEANUUP: checks: don't set conn->handle.fd to -1
This used to be needed to know whether there was a check in progress a
long time ago (before tcp_checks) but this is not true anymore and even
becomes wrong after the check is reused as conn_init() initializes it
to DEAD_FD_MAGIC.
2017-10-04 07:53:19 +02:00
Emeric Brun
52a91d3d48 MEDIUM: check: server states and weight propagation re-work
The server state and weight was reworked to handle
"pending" values updated by checks/CLI/LUA/agent.
These values are commited to be propagated to the
LB stack.

In further dev related to multi-thread, the commit
will be handled into a sync point.

Pending values are named using the prefix 'next_'
Current values used by the LB stack are named 'cur_'
2017-09-05 15:23:16 +02:00
Willy Tarreau
bbae3f0170 MEDIUM: connection: remove useless flag CO_FL_DATA_WR_SH
After careful inspection, this flag is set at exactly two places :
  - once in the health-check receive callback after receipt of a
    response
  - once in the stream interface's shutw() code where CF_SHUTW is
    always set on chn->flags

The flag was checked in the checks before deciding to send data, but
when it is set, the wake() callback immediately closes the connection
so the CO_FL_SOCK_WR_SH flag is also set.

The flag was also checked in si_conn_send(), but checking the channel's
flag instead is enough and even reveals that one check involving it
could never match.

So it's time to remove this flag and replace its check with a check of
CF_SHUTW in the stream interface. This way each layer is responsible
for its shutdown, this will ease insertion of the mux layer.
2017-08-30 10:05:49 +02:00
Willy Tarreau
54e917cfa1 MEDIUM: connection: remove useless flag CO_FL_DATA_RD_SH
This flag is both confusing and wrong. It is supposed to report the
fact that the data layer has received a shutdown, but in fact this is
reported by CO_FL_SOCK_RD_SH which is set by the transport layer after
this condition is detected. The only case where the flag above is set
is in the stream interface where CF_SHUTR is also set on the receiving
channel.

In addition, it was checked in the health checks code (while never set)
and was always test jointly with CO_FL_SOCK_RD_SH everywhere, except in
conn_data_read0_pending() which incorrectly doesn't match the second
time it's called and is fortunately protected by an extra check on
(ic->flags & CF_SHUTR).

This patch gets rid of the flag completely. Now conn_data_read0_pending()
accurately reports the fact that the transport layer has detected the end
of the stream, regardless of the fact that this state was already consumed,
and the stream interface watches ic->flags&CF_SHUTR to know if the channel
was already closed by the upper layer (which it already used to do).

The now unused conn_data_read0() function was removed.
2017-08-30 08:18:50 +02:00
Willy Tarreau
585744bf2e REORG/MEDIUM: connection: introduce the notion of connection handle
Till now connections used to rely exclusively on file descriptors. It
was planned in the past that alternative solutions would be implemented,
leading to member "union t" presenting sock.fd only for now.

With QUIC, the connection will need to continue to exist but will not
rely on a file descriptor but a connection ID.

So this patch introduces a "connection handle" which is either a file
descriptor or a connection ID, to replace the existing "union t". We've
now removed the intermediate "struct sock" which was never used. There
is no functional change at all, though the struct connection was inflated
by 32 bits on 64-bit platforms due to alignment.
2017-08-24 19:30:04 +02:00
Olivier Houchard
b68fda40d7 MINOR: check: Fix checks when using SRV records.
When started, a server may not yet have an associated protocol, so don't
bother trying to run the checks until it is there.
2017-08-09 16:32:50 +02:00
Willy Tarreau
f1d33db10a CLEANUP: task: remove all initializations to TICK_ETERNITY after task_new()
This is now guaranteed by design, simply remove these unneeded parts to
avoid confusion.
2017-07-24 17:55:20 +02:00
Baptiste Assmann
201c07f681 MAJOR/REORG: dns: DNS resolution task and requester queues
This patch is a major upgrade of the internal run-time DNS resolver in
HAProxy and it brings the following 2 main changes:

1. DNS resolution task

Up to now, DNS resolution was triggered by the health check task.
From now, DNS resolution task is autonomous. It is started by HAProxy
right after the scheduler is available and it is woken either when a
network IO occurs for one of its nameserver or when a timeout is
matched.

From now, this means we can enable DNS resolution for a server without
enabling health checking.

2. Introduction of a dns_requester structure

Up to now, DNS resolution was purposely made for resolving server
hostnames.
The idea, is to ensure that any HAProxy internal object should be able
to trigger a DNS resolution. For this purpose, 2 things has to be done:
  - clean up the DNS code from the server structure (this was already
    quite clean actually) and clean up the server's callbacks from
    manipulating too much DNS resolution
  - create an agnostic structure which allows linking a DNS resolution
    and a requester of any type (using obj_type enum)

3. Manage requesters through queues

Up to now, there was an uniq relationship between a resolution and it's
owner (aka the requester now). It's a shame, because in some cases,
multiple objects may share the same hostname and may benefit from a
resolution being performed by a third party.
This patch introduces the notion of queues, which are basically lists of
either currently running resolution or waiting ones.

The resolutions are now available as a pool, which belongs to the resolvers.
The pool has has a default size of 64 resolutions per resolvers and is
allocated at configuration parsing.
2017-06-02 11:58:54 +02:00
Baptiste Assmann
42746373eb REORG: dns: dns_option structure, storage of hostname_dn
This patch introduces a some re-organisation around the DNS code in
HAProxy.

1. make the dns_* functions less dependent on 'struct server' and 'struct resolution'.

With this in mind, the following changes were performed:
- 'struct dns_options' has been removed from 'struct resolution' (well,
  we might need it back at some point later, we'll see)
  ==> we'll use the 'struct dns_options' from the owner of the resolution
- dns_get_ip_from_response(): takes a 'struct dns_options' instead of
  'struct resolution'
  ==> so the caller can pass its own dns options to get the most
      appropriate IP from the response
- dns_process_resolve(): struct dns_option is deduced from new
  resolution->requester_type parameter

2. add hostname_dn and hostname_dn_len into struct server

In order to avoid recomputing a server's hostname into its domain name
format (and use a trash buffer to store the result), it is safer to
compute it once at configuration parsing and to store it into the struct
server.
In the mean time, the struct resolution linked to the server doesn't
need anymore to store the hostname in domain name format. A simple
pointer to the server one will make the trick.

The function srv_alloc_dns_resolution() properly manages everything for
us: memory allocation, pointer updates, etc...

3. move resolvers pointer into struct server

This patch makes the pointer to struct dns_resolvers from struct
dns_resolution obsolete.
Purpose is to make the resolution as "neutral" as possible and since the
requester is already linked to the resolvers, then we don't need this
information anymore in the resolution itself.
2017-06-02 11:26:48 +02:00
Willy Tarreau
f494977bc1 BUG/MINOR: checks: don't send proxy protocol with agent checks
James Brown reported that agent-check mistakenly sends the proxy
protocol header when it's configured. This is obviously wrong as
the agent is an independant servie and not a traffic port, let's
disable this.

This fix must be backported to 1.7 and possibly 1.6.
2017-05-06 08:45:28 +02:00
Frdric Lcaille
6e0843c0e0 MINOR: server: Add 'no-agent-check' server keyword.
This patch adds 'no-agent-check' setting supported both by 'default-server'
and 'server' directives to disable an agent check for a specific server which would
have 'agent-check' set as default value (inherited from 'default-server'
'agent-check' setting), or, on 'default-server' lines, to disable 'agent-check' setting
as default value for any further 'server' declarations.

For instance, provided this configuration:

    default-server agent-check
    server srv1
    server srv2 no-agent-check
    server srv3
    default-server no-agent-check
    server srv4

srv1 and srv3 would have an agent check enabled contrary to srv2 and srv4.

We do not allocate anymore anything when parsing 'default-server' 'agent-check'
setting.
2017-03-27 14:37:01 +02:00
Willy Tarreau
de40d798de CLEANUP: connection: completely remove CO_FL_WAKE_DATA
Since it's only set and never tested anymore, let's remove it.
2017-03-19 12:18:27 +01:00
Steven Davidovitz
544d481516 BUG/MINOR: checks: attempt clean shutw for SSL check
Strict interpretation of TLS can cause SSL sessions to be thrown away
when the socket is shutdown without sending a "close notify", resulting
in each check to go through the complete handshake, eating more CPU on
the servers.

[wt: strictly speaking there's no guarantee that the close notify will
 be delivered, it's only best effort, but that may be enough to ensure
 that once at least one is received, next checks will be cheaper. This
 should be backported to 1.7 and possibly 1.6]
2017-03-15 11:41:25 +01:00
Christopher Faulet
8ef75251e3 MAJOR: spoe: refactor the filter to clean up the code
The SPOE code is now pretty big and it was the good time to clean it up. It is
not perfect, some parts remains a bit ugly. But it is far better now.
2017-03-09 15:32:55 +01:00
Thierry FOURNIER
62c8a21c10 BUG/MINOR: sendmail: The return of vsnprintf is not cleanly tested
The string formatted by vsnprintf may be bigger than the size of
the buffer "buf". This case is not tested.

This sould be backported to 1.6 and 1.7
2017-02-10 06:18:17 +01:00
Willy Tarreau
04276f3d6e MEDIUM: server: split the address and the port into two different fields
Keeping the address and the port in the same field causes a lot of problems,
specifically on the DNS part where we're forced to cheat on the family to be
able to keep the port. This causes some issues such as some families not being
resolvable anymore.

This patch first moves the service port to a new field "svc_port" so that the
port field is never used anymore in the "addr" field (struct sockaddr_storage).
All call places were adapted (there aren't that many).
2017-01-06 19:29:33 +01:00
Willy Tarreau
a261e9b094 CLEANUP: connection: remove all direct references to raw_sock and ssl_sock
Now we exclusively use xprt_get(XPRT_RAW) instead of &raw_sock or
xprt_get(XPRT_SSL) for &ssl_sock. This removes a bunch of #ifdef and
include spread over a number of location including backend, cfgparse,
checks, cli, hlua, log, server and session.
2016-12-22 23:26:38 +01:00
Willy Tarreau
141ad85d10 MINOR: server: move the use_ssl field out of the ifdef USE_OPENSSL
Having it in the ifdef complicates certain operations which require
additional ifdefs just to access a member which could remain zero in
non-ssl cases. Let's move it out, it will not even increase the
struct size on 64-bit machines due to alignment.
2016-12-22 23:26:38 +01:00
Willy Tarreau
865c5148e6 CLEANUP: checks: make use of the post-init registration to start checks
Instead of calling the checks directly from the init code, we now
register the start_checks() function to be run at this point. This
also allows to unexport the check init function and to remove one
include from haproxy.c.
2016-12-21 21:30:54 +01:00
Tim Düsterhus
4896c440b3 DOC: Spelling fixes
[wt: this contains spelling fixes for both doc and code comments,
 should be backported, ignoring the parts which don't apply]
2016-11-29 07:29:57 +01:00
William Lallemand
9ed6203aef REORG: cli: split dumpstats.h in stats.h and cli.h
proto/dumpstats.h has been split in 4 files:

  * proto/cli.h  contains protypes for the CLI
  * proto/stats.h contains prototypes for the stats
  * types/cli.h contains definition for the CLI
  * types/stats.h contains definition for the stats
2016-11-24 16:59:27 +01:00
Willy Tarreau
8e0bb0ae16 MINOR: connection: add names for transport and data layers
This makes debugging easier and avoids having to put ugly checks
against certain well-known internal struct pointers.
2016-11-24 16:58:12 +01:00
Christopher Faulet
ba7bc164f7 MINOR: spoe/checks: Add support for SPOP health checks
A new "option spop-check" statement has been added to enable server health
checks based on SPOP HELLO handshake. SPOP is the protocol used by SPOE filters
to talk to servers.
2016-11-09 22:57:02 +01:00
Baptiste Assmann
95db2bcfee MAJOR: check: find out which port to use for health check at run time
HAProxy used to deduce port used for health checks when parsing configuration
at startup time.
Because of this way of working, it makes it complicated to change the port at
run time.

The current patch changes this behavior and makes HAProxy to choose the
port used for health checking when preparing the check task itself.

A new type of error is introduced and reported when no port can be found.

There won't be any impact on performance, since the process to find out the
port value is made of a few 'if' statements.

This patch also introduces a new check state CHK_ST_PORT_MISS: this flag is
used to report an error in the case when HAProxy needs to establish a TCP
connection to a server, to perform a health check but no TCP ports can be
found for it.

And last, it also introduces a new stream termination condition:
SF_ERR_CHK_PORT. Purpose of this flag is to report an error in the event when
HAProxy has to run a health check but no port can be found to perform it.
2016-09-11 08:12:13 +02:00
Willy Tarreau
64345aaaf0 BUILD: checks: remove the last strcat and eliminate a warning on OpenBSD
OpenBSD emits warnings on usages of strcpy, strcat and sprintf (and
probably a few others). Here we have a single such warning in all the code
reintroduced by commit 0ba0e4a ("MEDIUM: Support sending email alerts") in
1.6-dev1. Let's get rid of it, the open-coding of strcat is as small as its
usage and the the result is even more efficient.

This fix needs to be backported to 1.6.
2016-08-10 19:32:39 +02:00
Willy Tarreau
78f8dcb7f0 CLEANUP: external-check: don't block/unblock SIGCHLD when manipulating the list
There's no point in blocking/unblocking sigchld when removing entries
from the list since the code is called asynchronously.

Similarly the blocking/unblocking could be removed from the connect_proc_chk()
function but it happens that at high signal rates, fork() takes twice as much
time to execute as it is regularly interrupted by a signal, so in the end this
signal blocking is beneficial there for performance reasons.
2016-06-21 18:10:51 +02:00
Willy Tarreau
ebc9244059 BUG/MINOR: external-checks: do not unblock undesired signals
The external checks code makes use of block_sigchld() and unblock_sigchld()
to ensure nobody modifies the signals list while they're being manipulated.
It happens that these functions clear the list of blocked signals, so they
can possibly have a side effect if other signals are blocked. For now no
other signal is blocked but it may very well change in the future so rather
correctly use SIG_BLOCK/SIG_UNBLOCK instead of touching unrelated signals.

This fix should be backported to 1.6 for correctness.
2016-06-21 18:10:50 +02:00
Willy Tarreau
48d6bf2e82 BUG/MAJOR: external-checks: use asynchronous signal delivery
There are random segfaults occuring when using external checks. The
reason is that when receiving a SIGCHLD, a call to task_wakeup() is
performed. There are two situations where this causes trouble :
  - the scheduler is in process_running_tasks(), since task_wakeup()
    sets rq_next to NULL, when the former dereferences it to get the
    next pointer, the program crashes ;

  - when another task_wakeup() is being called and during eb_next()
    in process_running_tasks(), because the structure of the run queue
    tree changes while it is being processed.

The solution against this is to use asynchronous signal processing
thanks to the internal signal API. The signal handler is registered,
and upon delivery, the signal is added to the queue and processed
out of any other processing.

This code was stressed at 2500 forks/s and their respective signals
for quite some time and the segfault is now gone.
2016-06-21 18:10:50 +02:00
Willy Tarreau
b7b2478733 BUG/MEDIUM: external-checks: close all FDs right after the fork()
Lukas Erlacher reported an interesting problem : since we don't close
FDs after the fork() upon external checks, any script executed that
writes data on stdout/stderr will possibly send its data to wrong
places, very likely an existing connection.

After some analysis, the problem is even wider. It's not enough to
just close stdin/stdout/stderr, as all sockets are passed to the
sub-process, and as long as they're not closed, they are usable for
whatever mistake can be done. Furthermore with epoll, such FDs will
continue to be reported after a close() as the underlying file is
not closed yet.

CLOEXEC would be an acceptable workaround except that 1) it adds an
extra syscall on the fast path, and 2) we have no control over FDs
created by external libs (eg: openssl using /dev/crypto, libc using
/dev/random, lua using anything else), so in the end we still need
to close them all.

On some BSD systems there's a closefrom() syscall which could be
very useful for this.

Based on an insightful idea from Simon Horman, we don't close 0/1/2
when we're in verbose mode since they're properly connected to
stdin/stdout/stderr and can become quite useful during debugging
sessions to detect some script output errors or execve() failures.

This fix must be backported to 1.6.
2016-06-21 18:10:50 +02:00
Nenad Merdanovic
174dd37d88 MINOR: Add ability for agent-check to set server maxconn
This is very useful in complex architecture systems where HAproxy
is balancing DB connections for example. We want to keep the maxconn
high in order to avoid issues with queueing on the LB level when
there is slowness on another part of the system. Example is a case of
an architecture where each thread opens multiple DB connections, which
if get stuck in queue cause a snowball effect (old connections aren't
closed, new ones cannot be established). These connections are mostly
idle and the DB server has no problem handling thousands of them.

Allowing us to dynamically set maxconn depending on the backend usage
(LA, CPU, memory, etc.) enables us to have high maxconn for situations
like above, but lowering it in case there are real issues where the
backend servers become overloaded (cache issues, DB gets hit hard).
2016-04-25 17:23:50 +02:00
Pieter Baauw
235fcfcf14 MINOR: mailers: make it possible to configure the connection timeout
This patch introduces a configurable connection timeout for mailers
with a new "timeout mail <time>" directive.

Acked-by: Simon Horman <horms@verge.net.au>
2016-02-20 15:33:06 +01:00
Thierry Fournier
ada348459f MEDIUM: dns: extract options
DNS selection preferences are actually declared inline in the
struct server. There are copied from the server struct to the
dns_resolution struct for each resolution.

Next patchs adds new preferences options, and it is not a good
way to copy all the configuration information before each dns
resolution.

This patch extract the configuration preference from the struct
server and declares a new dedicated struct. Only a pointer to this
new striuict will be copied before each dns resolution.
2016-02-19 14:37:46 +01:00
Pieter Baauw
5e0964ed01 MINOR: mailers: use <CRLF> for all line endings
Not doing so causes issues with Exchange2013 not processing the message
body from the email. Specification seems to specify that as correct
behavior : https://www.ietf.org/rfc/rfc2821.txt # 2.3.7 Lines

 > SMTP client implementations MUST NOT transmit "bare" "CR" or "LF" characters.

This patch should be backported to 1.6.

Acked-by: Simon Horman <horms@verge.net.au>
2016-02-17 10:19:09 +01:00
Pieter Baauw
46af170e41 MINOR: mailers: increase default timeout to 10 seconds
This allows the tcp connection to send multiple SYN packets, so 1 lost
packet does not cause the mail to be lost. It changes the socket timeout
from 2 to 10 seconds, this allows for 3 syn packets to be send and
waiting a little for their reply.

This patch should be backported to 1.6.

Acked-by: Simon Horman <horms@verge.net.au>
2016-02-17 10:19:08 +01:00
Cyril Bont
b65e0335d9 BUG/MINOR: checks: typo in an email-alert error message
When the email alert message couldn't be formatted, the logged error message
said the contrary.

This fix must be backported to 1.6.
2015-12-04 06:09:30 +01:00
James Brown
55f9ff11b5 MINOR: check: add agent-send server parameter
Causes HAProxy to emit a static string to the agent on every check,
so that you can independently control multiple services running
behind a single agent port.
2015-11-04 07:26:51 +01:00
Andrew Hayworth
e6a4a329b8 MEDIUM: dns: Don't use the ANY query type
Basically, it's ill-defined and shouldn't really be used going forward.
We can't guarantee that resolvers will do the 'legwork' for us and
actually resolve CNAMES when we request the ANY query-type. Case in point
(obfuscated, clearly):

  PRODUCTION! ahayworth@secret-hostname.com:~$
  dig @10.11.12.53 ANY api.somestartup.io

  ; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> @10.11.12.53 ANY api.somestartup.io
  ; (1 server found)
  ;; global options: +cmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62454
  ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0

  ;; QUESTION SECTION:
  ;api.somestartup.io.                        IN      ANY

  ;; ANSWER SECTION:
  api.somestartup.io.         20      IN      CNAME api-somestartup-production.ap-southeast-2.elb.amazonaws.com.

  ;; AUTHORITY SECTION:
  somestartup.io.               166687  IN      NS      ns-1254.awsdns-28.org.
  somestartup.io.               166687  IN      NS      ns-1884.awsdns-43.co.uk.
  somestartup.io.               166687  IN      NS      ns-440.awsdns-55.com.
  somestartup.io.               166687  IN      NS      ns-577.awsdns-08.net.

  ;; Query time: 1 msec
  ;; SERVER: 10.11.12.53#53(10.11.12.53)
  ;; WHEN: Mon Oct 19 22:02:29 2015
  ;; MSG SIZE  rcvd: 242

HAProxy can't handle that response correctly.

Rather than try to build in support for resolving CNAMEs presented
without an A record in an answer section (which may be a valid
improvement further on), this change just skips ANY record types
altogether. A and AAAA are much more well-defined and predictable.

Notably, this commit preserves the implicit "Prefer IPV6 behavior."

Furthermore, ANY query type by default is a bad idea: (from Robin on
HAProxy's ML):
  Using ANY queries for this kind of stuff is considered by most people
  to be a bad practice since besides all the things you named it can
  lead to incomplete responses. Basically a resolver is allowed to just
  return whatever it has in cache when it receives an ANY query instead
  of actually doing an ANY query at the authoritative nameserver. Thus
  if it only received queries for an A record before you do an ANY query
  you will not get an AAAA record even if it is actually available since
  the resolver doesn't have it in its cache. Even worse if before it
  only got MX queries, you won't get either A or AAAA
2015-10-20 22:31:01 +02:00
Baptiste Assmann
6076d1c02d MINOR: server: startup slowstart task when using seamless reload of HAProxy
This patch uses the start up of the health check task to also start
the warmup task when required.

This is executed only once: when HAProxy has just started up and can
be started only if the load-server-state-from-file feature is enabled
and the server was in the warmup state before a reload occurs.
2015-09-19 17:05:28 +02:00
Baptiste Assmann
f778bb46d6 BUG/MINOR: DNS request retry counter used for retry only
There are two types of retries when performing a DNS resolution:
1. retry because of a timeout
2. retry of the full sequence of requests (query types failover)

Before this patch, the 'resolution->try' counter was incremented
after each send of a DNS request, which does not cover the 2 cases
above.
This patch fix this behavior.
2015-09-10 15:46:03 +02:00
Baptiste Assmann
f0d9370f6b BUG/MEDIUM: dns: DNS resolution doesn't start
Patch f046f11561 introduced a regression:
DNS resolution doesn't start anymore, while it was supposed to make it
start with first health check.

Current patch fix this issue by triggering a new DNS resolution if the
last_resolution time is not set.
2015-09-08 10:51:22 +02:00
Pieter Baauw
ed35c371dc BUG/MEDIUM: mailer: DATA part must be terminated with <CRLF>.<CRLF>
The dot is send in the wrong place.
As defined in https://www.ietf.org/rfc/rfc2821.txt 'the character sequence "<CRLF>.<CRLF>" ends the mail text'
2015-07-22 22:39:39 +02:00
Baptiste Assmann
a68ca96375 MAJOR: server: add DNS-based server name resolution
Relies on the DNS protocol freshly implemented in HAProxy.
It performs a server IP addr resolution based on a server hostname.
2015-06-13 22:07:35 +02:00
Willy Tarreau
449f952cb3 BUG/MAJOR: checks: break infinite loops when tcp-checks starts with comment
If a tcp-check sequence starts with "comment", then the action is not
matched in the while() loop and the pointer doesn't advance so we face
an endless loop. It is normally detected early except in the case where
very slow checks are performed causing it to trigger after the admin stops
watching.

This bug is 1.6-only and very recent so it didn't have the time to affect
anyone.
2015-05-13 15:39:48 +02:00
Willy Tarreau
5581c27b57 BUG/MEDIUM: checks: do not dereference a list as a tcpcheck struct
The method used to skip to next rule in the list is wrong, it assumes
that the list element starts at the same offset as the rule. It happens
to be true on most architectures since the list is the first element for
now but it's definitely wrong. Now the code doesn't crash anymore when
the struct list is moved anywhere else in the struct tcpcheck_rule.

This fix must be backported to 1.5.
2015-05-13 15:31:34 +02:00
Willy Tarreau
f2c87353a7 BUG/MAJOR: checks: always check for end of list before proceeding
This is the most important fix of this series. There's a risk of endless
loop and crashes caused by the fact that we go past the head of the list
when skipping to next rule, without checking if it's still a valid element.
Most of the time, the ->action field is checked, which points to the proxy's
check_req pointer (generally NULL), meaning the element is confused with a
TCPCHK_ACT_SEND action.

The situation was accidently made worse with the addition of tcp-check
comment since it also skips list elements. However, since the action that
makes it go forward is TCPCHK_ACT_COMMENT (3), there's little chance to
see this as a valid pointer, except on 64-bit machines where it can match
the end of a check_req string pointer.

This fix heavily depends on previous cleanup and both must be backported
to 1.5 where the bug is present.
2015-05-13 15:31:34 +02:00
Willy Tarreau
263013d031 CLEANUP: checks: simplify the loop processing of tcp-checks
There is some unobvious redundancy between the various ways we can leave
the loop. Some of them can be factored out. So now we leave the loop when
we can't go further, whether it's caused by reaching the end of the rules
or by a blocking I/O.
2015-05-13 12:30:46 +02:00