This patch adds a new 'post-80' option that sets the
CLIENT_PLUGIN_AUTH (0x00080000) capability flag
and explicitly specifies mysql_native_password as
the authentication plugin in the handshake response.
This patch also addes documentation content for post-80 option
support in MySQL 8.x version. Which handles new default auth
plugin caching_sha2_password.
MySQL 8.0 changed the default authentication plugin from
mysql_native_password to caching_sha2_password.
The current mysql-check implementation only supports pre-41
and post-41 client auth protocols, which lack the CLIENT_PLUGIN_AUTH
capability flag. When HAProxy sends a post-41 authentication
packet to a MySQL 8.x server, the server responds with error 1251:
"Client does not support authentication protocol requested by server".
The new client capabilities for post-80 are:
- CLIENT_PROTOCOL_41 (0x00000200)
- CLIENT_SECURE_CONNECTION (0x00008000)
- CLIENT_PLUGIN_AUTH (0x00080000)
Usage example:
backend mysql_servers
option mysql-check user haproxy post-80
server db1 192.168.1.10:3306 check
The health check user must be created with mysql_native_password:
CREATE USER 'haproxy'@'%' IDENTIFIED WITH mysql_native_password BY '';
This addresses https://github.com/haproxy/haproxy/issues/2934.
Commit fa094d0b61 changed the msg callback
args, but forgot to fix quic_tls_msg_callback() accordingly, so do that,
and remove the unused struct connection paramter.
A regression was introduced in the commit 0ea601127 ("BUG/MAJOR: applet: Don't
call I/O handler if the applet was shut"). The test on shut flags for legacy
applets is inverted.
It should be harmeless on 3.4 and 3.3 because all applets were converted. But
this fix is mandatory for 3.2 and older.
The patch must be backported as far as 3.0 with the commit above.
s/mecanism/mechanism
s/got ride/got rid
s/traditionnal/traditional
One typo is confusion between master and worker that results to a
semantic mistake in the sentence:
"...the master will emit an "exit-on-failure" error and will kill every
workers with a SIGTERM and exits with the same error code than the
failed [-master-]{+worker+}..."
Should be backported as far as 3.1.
OpenSSL 4.0 changed the way it stores objects in X509_STORE structures
and are not allowing anymore to iterate on objects in insertion order.
Meaning that the order of the object are not the same before and after
OpenSSL 4.0, and the reg-tests need to handle both cases.
OpenSSL 4.0 is deprecating X509_STORE_get0_objects().
Every occurence of X509_STORE_get0_objects() was first replaced by
X509_STORE_get1_objects().
This changes the ref count of the STACK_OF(X509_OBJECT) everywhere, and
need it to be sk_X509_OBJECT_pop_free(objs, X509_OBJECT_free) each time.
X509_STORE_get1_objects() is not available in AWS-LC, OpenSSL < 3.2,
LibreSSL and WolfSSL, so we need to still be compatible with get0.
To achieve this, 2 macros were added X509_STORE_getX_objects() and
sk_X509_OBJECT_popX_free(), these macros will use either the get0 or the
get1 macro depending on their availability. In the case of get0,
sk_X509_OBJECT_popX_free() will just do nothing instead of trying to
free.
Don't backport that unless really needed if we want to be compatible
with OpenSSL 4.0. It changes all the refcounts.
SSL msg callbacks are used for notification about sent/received SSL
messages. Such callbacks are registered via
ssl_sock_register_msg_callback().
Prior to this patch, connection was passed as first argument of these
callbacks. However, most of them do not use it. Worst, this may lead to
confusion as connection can be NULL in QUIC context.
This patch cleans this by removing connection argument. As an
alternative, connection can be retrieved in callbacks if needed using
ssl_sock_get_conn() but the code must be ready to deal with potential
NULL instances. As an example, heartbeat parsing callback has been
adjusted in this manner.
With QUIC backend implementation, SSL code has been adjusted in several
place when accessing connection instance. Indeed, with QUIC usage, SSL
context is tied up to quic_conn, and code may be executed prior/after
connection instantiation. For example, on frontend side, connection is
only created after QUIC handshake completion.
The following patch tried to fix unsafe accesses to connection. In
particular, msg callbacks are not called anymore if connection is NULL.
fab7da0fd0
BUG/MEDIUM: quic-be/ssl_sock: TLS callback called without connection
However, most msg callbacks do not need to use the connection instance.
The only occurence where it is accessed is for heartbeat message
parsing, which is the only case of crash solved. The above fix is too
restrictive as it completely prevents execution of these callbacks when
connection is unset. This breaks several features with QUIC, such as SSL
key logging or samples based on ClientHello capture.
The current patch reverts the above one. Thus, this restores invokation
of msg callbacks for QUIC during the whole low-level connection
lifetime. This requires a small adjustment in heartbeat parsing callback
to prevent access on a NULL connection.
The issue on ClientHello capture was mentionned in github issue #2495.
This must be backported up to 3.3.
The help message for "ktls" mentions "expose-experimental-directive"
without the final 's', which is particularly annoying when copy-pasting
the directive from the error message directly into the config.
This should be backported to 3.3.
As reported by Ben Kallus in the following thread:
https://www.mail-archive.com/haproxy@formilux.org/msg46471.html
there exist some agents which mistakenly accept CRLF inside quoted
chunk extensions, making it possible to fool them by injecting one
extra chunk they won't see for example, or making them miss the end
of the body depending on how it's done. Haproxy, like most other
agents nowadays, doesn't care at all about chunk extensions and just
drops them, in agreement with the spec.
However, as discussed, since chunk extensions are basically never used
except for attacks, and that the cost of just matching quote pairs and
checking backslashed quotes is escape consistency remains relatively
low, it can make sense to add such a check to abort the message parsing
when this situation is encountered. Note that it has to be done at two
places, because there is a fast path and a slow path for chunk parsing.
Also note that it *will* cause transfers using improperly formatted chunk
extensions to fail, but since these are really not used, and that the
likelihood of them being used but improperly quoted certainly is much
lower than the risk of crossing a broken parser on the client's request
path or on the server's response path, we consider the risk as
acceptable. The test is not subject to the configurable parser exceptions
and it's very unlikely that it will ever be needed.
Since this is done in 3.4 which will be LTS, this patch will have to be
backported to 3.3 so that any unlikely trouble gets a chance to be
detected before users upgrade to 3.4.
Thanks to Ben for the discussion, and to Rajat Raghav for sparking it
in the first place even though the original report was mistaken.
Cc: Ben Kallus <benjamin.p.kallus.gr@dartmouth.edu>
Cc: Rajat Raghav <xclow3n@gmail.com>
Cc: Christopher Faulet <cfaulet@haproxy.com>
There's already a tunable "tune.idle-pool.shared" allowing to enable or
disable idle connection sharing between threads. However the doc does not
mention that these connections are only shared between threads of the same
thread group, since 2.7 with commit 15c5500b6e ("MEDIUM: conn: make
conn_backend_get always scan the same group"). Let's clarify this and
also give a hint about "max-threads-per-group" which can be helpful for
machines with unified caches.
The code in srv_add_to_idle_list() has its roots in 2.0 with commit
9ea5d361ae ("MEDIUM: servers: Reorganize the way idle connections are
cleaned."). At this era we didn't yet have the current set of atomic
load/store operations and we used to perform loads using volatile casts
after a barrier. It turns out that this function has kept this schema
over the years, resulting in a big mfence stalling all the pipeline
in the function:
| static __inline void
| __ha_barrier_full(void)
| {
| __asm __volatile("mfence" ::: "memory");
27.08 | mfence
| if ((volatile void *)srv->idle_node.node.leaf_p == NULL) {
0.84 | cmpq $0x0,0x158(%r15)
0.74 | je 35f
| return 1;
Switching these for a pair of atomic loads got rid of this and brought
0.5 to 3% extra performance depending on the tests due to variations
elsewhere, but it has never been below 0.5%. Note that the second load
doesn't need to be atomic since it's protected by the lock, but it's
cleaner from an API and code review perspective. That's also why it's
relaxed.
This was the last user of _ha_barrier_full(), let's try not to
reintroduce it now!
There's still a lot of contention when accessing the backend's
totpend and queueslength for every request in may_dequeue_tasks(),
even when queues are not used. This only happens because it's stored
in the same cache line as >beconn which is being written by other
threads:
0.01 | call sess_change_server
0.02 | mov 0x188(%r15),%esi ## s->queueslength
| if (may_dequeue_tasks(srv, s->be))
0.00 | mov 0xa8(%r12),%rax
0.00 | mov -0x50(%rbp),%r11d
0.00 | mov -0x60(%rbp),%r10
0.00 | test %esi,%esi
| jne 3349
0.01 | mov 0xa00(%rax),%ecx ## p->queueslength
8.26 | test %ecx,%ecx
4.08 | je 288d
This patch moves queueslength and totpend to their own cache line,
thus adding 64 bytes to the struct proxy, but gaining 3.6% of RPS
on a 64-core EPYC thanks to the elimination of this false sharing.
process_stream() goes down from 3.88% to 3.26% in perf top, with
the next top users being inc/dec (s->served) and be->beconn.
This field is shared by all threads and must be in the shared area
instead, because where it's placed, it slows down access to other
fields of the struct by false sharing. Just moving this field gives
a steady 2% gain on the request rate (1.93 to 1.96 Mrps) on a 64-core
EPYC.
It took me one hour of trial and fail to figure that kTLS and splicing
were not used only for reasons of TLS version, and that switching to
TLS v1.2 solved the issue. Thus, let's mention it in the doc so that
others find it more easily in the future.
This should be backported to 3.3.
This option allows to disable the certificate compression (RFC 8879)
using OpenSSL >= 3.2.0.
This feature is known to permit some denial of services by causing extra
memory allocations of approximately 22MiB and extra CPU work per
connection with OpenSSL versions affected by CVE-2025-66199.
( https://openssl-library.org/news/vulnerabilities/index.html#CVE-2025-66199 )
Setting this to "off" permits to mitigate the problem.
Must be backported to every stable branches.
In 3.0, it was stated an applet could not be woken up after it was shutdown.
So the corresponding test in the applets I/O handler was removed. However,
it seems it may happen, especially when outgoing data are blocked on the
opposite side. But it is really unexpected because the "release" callback
function was already called and the appctx context was most probably
released.
Strangely, it was never detected by any applet till now. But the Prometheus
exporter was never updated and was still testing the shutdown. But when it
was refactored to use the new applet API in 3.3, the test was removed. And
this introduced a regression leading a crash because a server object could
be corrupted. Conditions to hit the bug are not really clear however.
So, now, to avoid any issue with all other applets, the test is performed in
task_process_applet(). The I/O handler is no longer called if the applet is
already shut.
The same is performed for applets still relying on the old API.
An amazing thanks to @idl0r for his invaluable help on this issue !
This patch should fix the issue #3244. It should first be backported to 3.3
and then slowly as far as 3.0.
The SSL passphrase callback function was only called when loading
private keys from a dedicated file (separate from the corresponding
certificate) but not when both the certificate and the key were in the
same file.
We can now load them properly, regardless of how they are provided.
A flas had to be added in the 'passphrase_cb_data' structure because in
the 'ssl_sock_load_pem_into_ckch' function, when calling
'PEM_read_bio_PrivateKey' there might be no private key in the PEM file
which would mean that the callback never gets called (and cannot set the
'passphrase_idx' to -1).
This patch can be backported to 3.3.
Some error paths in 'ssl_sock_passwd_cb' (allocation failures) did not
set the 'passphrase_idx' to -1 which is the way for the caller to know
not to call the callback again so in some memory contention contexts we
could end up calling the callback 'infinitely' (or until memory is
finally available).
This patch must be backported to 3.3.
Certain object sizes cannot be controlled at declaration time because
the resulting object size may be slightly extended (tag, caller),
aligned and rounded up, or even doubled depending on pool settings
(e.g. if backup is used).
This patch addresses this by enlarging the type in the pool registration
to 64-bit so that no info is lost from the declaration, and extra checks
for overflows can be performed during registration after various rounding
steps. This allows to catch issues such as these ones and to report a
suitable error:
global
tune.http.logurilen 2147483647
frontend
capture request header name len 2147483647
http-request capture src len 2147483647
tcp-request content capture src len 2147483647
Since 3.3 with commit 945aa0ea82 ("MINOR: initcalls: Add a new initcall
stage, STG_INIT_2"), stkt_late_init() calls stkt_create_stk_ctr_pool()
but doesn't check its return value, so if the pool creation fails, the
process still starts, which is not correct. This patch adds a check for
the return value to make sure we fail to start in this case. This was
not an issue before 3.3 because the function was called as a post-check
handler which did check for errors in the returned values.
A few capture pools can fail in case of too large values for example.
These include the req_uri, capture, and caphdr pools, and may be triggered
with "tune.http.logurilen 2147483647" in the global section, or one of
these in a frontend:
capture request header name len 2147483647
http-request capture src len 2147483647
tcp-request content capture src len 2147483647
These seem to be the only occurrences where create_pool()'s return value
is assigned without being checked, so let's add the proper check for
errors there. This can be backported as a hardening measure though the
risks and impacts are extremely low.
Condition to report the support for HAVE_TCP_MD5SIG feature was inverted. It
is only an issue for the reg-test related to this feature.
This patch must be backported to 3.3.
UNUSED blocks were not properly handled when the H1 multiplexer was
formatting the start line of a request or a response. UNUSED was ignored but
not removed from HTX message. So the mux can loop infinitly on such block.
It could be seen a a major issue but in fact it happens only if a very
specific case on the reponse processing (at least I think so): the server
must send an interim message (a 100-continue for intance) with the final
response. HAProxy must receive both in same time and the final reponse must
be intercepted (via a http-response return action for instance), In that
case, the interim message is fowarded and the server final reponse is
removed and replaced by a proxy error message.
Now UNUSED htx blocks are properly skipped and removed.
This patch must be backported as far as 3.0.
If an error occurres during the dump of a metric for a server, we must take
care to detach promex from the watcher list for this server. It must be
performed explicitly because on error, the applet state (st1) is changed, so
it is not possible to detach it during the applet release stage.
This patch must be backported with b4f64c0ab ("BUG/MEDIUM: promex: server
iteration may rely on stale server") as far as 3.0. On older versions, 2.8
and 2.6, the watcher_detach() line must be changed by "srv_drop(ctx->p[1])".
We frequently use lua_pcall() to provide safe alternative functions
(encapsulated helpers) that prevent the process from crashing in case
of Lua error when Lua is executed from an unsafe environment.
However, some of those safe helpers don't handle errors properly. In case
of error, the Lua API will always put an error object on top of the stack
as stated in the documentation. This error object can be used to retrieve
more info about the error. But in some cases when we ignore it, we should
still consume it to prevent the stack from being altered with an extra
object when returning from the helper function.
It should be backported to all stable versions. If the patch doesn't apply
automatically, all that's needed is to check for lua_pcall() in hlua.c
and for other cases than 'LUA_OK', make sure that the error object is popped
from the stack before the function returns.
Since commit 365ee28 ("BUG/MINOR: hlua: prevent LJMP in hlua_traceback()")
we now use lua_pcall() to protect sensitive parts of hlua_traceback()
function, and this to prevent Lua from crashing the process in case of
unexpected Lua error.
This is still relevant, but an error was made, as lua_pcall() was given
the nresult argument '1' when _hlua_traceback() internal function
doesn't push any argument on the stack. Because of this, it seems Lua
API still tries to push garbage object on top of the stack before
returning. This may cause functions that leverage hlua_traceback() in
the middle of stack manipulation to end up having a corrupted stack when
continuing after the hlua_traceback().
There doesn't seem to be many places where this could be a problem, as
this was discovered using the reproducer documented in f535d3e
("BUG/MEDIUM: debug: only dump Lua state when panicking"). Indeed, when
hlua_traceback() was used from the signal handler while the thread was
previously executing Lua, when returning to Lua after the handler the
Lua stack would be corrupted.
To fix the issue, we emphasize on the fact that the _hlua_traceback()
function doesn't push anything on the stack, returns 0, thus lua_pcall()
is given 0 'nresult' argument to prevent anything from being pushed after
the execution, preserving the original stack state.
This should be backported to all stable versions (because 365ee28 was
backported there)
Released version 3.4-dev3 with the following main changes :
- BUILD: ssl: strchr definition changed in C23
- BUILD: tools: memchr definition changed in C23
- BUG/MINOR: cfgparse: wrong section name upon error
- MINOR: cfgparse: Refactor "userlist" parser to print it in -dKall operation
- BUILD: sockpair: fix build issue on macOS related to variable-length arrays
- BUG/MINOR: cli/stick-tables: argument to "show table" is optional
- REGTESTS: ssl: Fix reg-tests curve check
- CI: github: remove ERR=1 temporarly from the ECH job
- BUG/MINOR: ech/quic: enable ech configuration also for quic listeners
- MEDIUM: config: warn if some userlist hashes are too slow
- MINOR: cfgparse: remove duplicate "force-persist" in common kw list
- MINOR: sample: also support retrieving fc.timer.handshake without a stream
- MINOR: tcp-sample: permit retrieving tcp_info from the connection/session stage
- CLEANUP: connection: Remove outdated note about CO_FL `0x00002000` being unused
- MINOR: receiver: Dynamically alloc the "members" field of shard_info
- MINOR: stats: Increase the tgid from 8bits to 16bits
- BUG/MINOR: stats-file: Use a 16bits variable when loading tgid
- BUG/MINOR: hlua_fcn: fix broken yield for Patref:add_bulk()
- BUG/MINOR: hlua_fcn: ensure Patref:add_bulk() is given a table object before using it
- BUG/MINOR: net_helper: fix IPv6 header length processing
- MEDIUM: counters: Dynamically allocate per-thread group counters
- MEDIUM: counters: Remove some extra tests
- BUG/MEDIUM: threads: Fix binding thread on bind.
- BUG/MEDIUM: quic: fix ACK ECN frame parsing
- MEDIUM: counters: mostly revert da813ae4d7
- BUG/MINOR: http_act: fix deinit performed on uninitialized lf_expr in release_http_map()
- MINOR: queues: Turn non_empty_tgids into a long array.
- MINOR: threads: Eliminate all_tgroups_mask.
- BUG/MEDIUM: queues: Fix arithmetic when feeling non_empty_tgids
- MEDIUM: thread: Turn the group mask in thread set into a group counter
- BUG/MINOR: proxy: free persist_rules
- MEDIUM: stream: refactor switching-rules processing
- REGTESTS: add test on backend switching rules selection
- MEDIUM: proxy: do not select a backend if disabled
- MEDIUM: proxy: implement publish/unpublish backend CLI
- MINOR: stats: report BE unpublished status
- MINOR: cfgparse: adapt warnif_cond_conflicts() error output
- MEDIUM: proxy: force traffic on unpublished/disabled backends
- MINOR: ssl: Factorize AES GCM data processing
- MINOR: ssl: Add new aes_cbc_enc/_dec converters
- REGTESTS: ssl: Add tests for new aes cbc converters
- MINOR: jwe: Add new jwt_decrypt_secret converter
- MINOR: jwe: Add new jwt_decrypt_cert converter
- REGTESTS: jwe: Add jwt_decrypt_secret and jwt_decrypt_cert tests
- DOC: jwe: Add doc for jwt_decrypt converters
- MINOR: jwe: Some algorithms not supported by AWS-LC
- REGTESTS: jwe: Fix tests of algorithms not supported by AWS-LC
- BUG/MINOR: cfgparse: fix "default" prefix parsing
- REORG/MINOR: cfgparse: eliminate code duplication by lshift_args()
- MEDIUM: systemd: implement directory loading
- CI: github: switch monthly Fedora Rawhide build to OpenSSL
- SCRIPTS: build-ssl: use QUICTLS_VERSION instead of QUICTLS=yes
- CI: github: define the right quictls version in each jobs
- CI: github: fix vtest.yml with "not quictls"
- MINOR: cli: use srv_drop() when server was created using new_server()
- BUG/MINOR: server: ensure server is detached from proxy list before being freed
- BUG/MEDIUM: promex: server iteration may rely on stale server
- SCRIPTS: build-ssl: clone the quictls branch directly
- SCRIPTS: build-ssl: fix quictls build for 1.1.1 versions
- BUG/MEDIUM: log: parsing log-forward options may result in segfault
- DOC: proxy-protocol: Add SSL client certificate TLV
- DOC: fix typos in the documentation files
- DOC: fix mismatched quotes typos around words in the documentation files
- REORG: cfgparse: move peers parsing to cfgparse-peers.c
- MINOR: tools: add chunk_escape_string() helper function
- MINOR: vars: store variable names for runtime access
- MINOR: vars: implement dump_all_vars() sample fetch
- DOC: vars: document dump_all_vars() sample fetch
- BUG/MEDIUM: ssl: fix error path on generate-certificates
- BUG/MEDIUM: ssl: fix generate-certificates option when SNI greater than 64bytes
- BUG/MEDIUM: mux-quic: prevent BUG_ON() on aborted uni stream close
- REGTESTS: ssl: fix generate-certificates w/ LibreSSL
- SCRIPTS: build: enable symbols in AWS-LC builds
- BUG/MINOR: proxy: fix deinit crash on defaults with duplicate name
- BUG/MEDIUM: debug: only dump Lua state when panicking
- MINOR: proxy: remove proxy_preset_defaults()
- MINOR: proxy: refactor defaults proxies API
- MINOR: proxy: simplify defaults proxies list storage
- MEDIUM: cfgparse: do not store unnamed defaults in name tree
- MEDIUM: proxy: implement persistent named defaults
This patch changes the handling of named defaults sections. Prior to
this patch, every unreferenced defaults proxies were removed on post
parsing. Now by default, these sections are kept after postparsing and
only purged on deinit. The objective is to allow reusing them as base
configuration for dynamic backends.
To implement this, refcount of every still addressable named sections is
incremented by one after parsing. This ensures that they won't be
removed even if referencing proxies are removed at runtime. This is done
via the new function proxy_ref_all_defaults().
To ensure defaults instances are still properly removed on deinit, the
inverse operation is performed : refcount is decremented by one on every
defaults sections via proxy_unref_all_defaults().
The original behavior can still be used by using the new global keyword
tune.defaults.purge. This is useful for users using configuration with
large number of defaults and not interested in dynamic backends
creation.
Defaults section are indexed by their name in defproxy_by_name tree. For
named sections, there is no duplicate : if two instances have the same
name, the older one is removed from the tree. However, this was not the
case for unnamed defaults which are all stored inconditionnally in
defproxy_by_name.
This commit introduces a new approach for unnamed defaults. Now, these
instances are never inserted in the defproxy_by_name tree. Indeed, this
is not needed as no tree lookup is performed with empty names. This may
optimize slightly config parsing with a huge number of named and unnamed
defaults sections, as the first ones won't fill up the tree needlessly.
However, defproxy_by_name tree is also used to purge unreferenced
defaults instances, both on postparsing and deinit. Thus, a new approach
is needed for unnamed sections cleanup. Now, each time a new defaults is
parsed, if the previous instance is unnamed, it is freed unless if
referenced by a proxy. When config parsing is ended, a similar operation
is performed to ensure the last unnamed defaults section won't stay in
memory. To implement this, last_defproxy static variable is now set to
global. Unnamed sections which cannot be removed due to proxies
referencing proxies will still be removed when such proxies are freed
themselves, at runtime or on deinit.
Defaults proxies instance are stored in a global name tree. When there
is a name conflict and the older entry cannot be simply discarded as it
is already referenced, the older entry is instead removed from the name
tree and inserted into the orphaned list.
The purpose of the orphaned list was to guarantee that any remaining
unreferenced defaults are purged either on postparsing or deinit.
However, this is in fact completely useless. Indeed on postparsing,
orphaned entries are always referenced. On deinit instead, defaults are
already freed along the cleanup of all frontend/backend instances clean
up, thanks to their refcounting.
This patch streamlines this by removing orphaned list. Instead, a
defaults section is inserted into a new global defaults_list during
their whole lifetime. This is not strictly necessary but it ensures that
defaults instances can still be accessed easily in the future if needed
even if not present in the name tree. On deinit, a BUG_ON() is added to
ensure that defaults_list is indeed emptied.
Another benefit from this patch is to simplify the defaults deletion
procedure. Orphaned simple list is replaced by a proper double linked
list implementation, so a single LIST_DELETE() is now performed. This
will be notably useful as defaults may be removed at runtime in the
future if backends deletion at runtime is implemented.
This patch renames functions which deal with defaults section. A common
"defaults_px_" prefix is defined. This serves as a marker to identify
functions which can only be used with proxies defaults capability. New
BUG_ON() are enforced to ensure this is valid.
Also, older proxy_unref_or_destroy_defaults() is renamed
defaults_px_detach().
Function proxy_preset_defaults() purpose has evolved over time.
Originally, it was only used to initialize defaults proxies instances.
Until today, it was extended so that all proxies use it. Its objective
is to initialize settings to common default values.
To remove the confusion, this function is now removed. Its content is
integrated directly into init_new_proxy().
For a long time, we've tried to show the Lua state and backtrace when
dumping threads so as to be able to figure is (and which) Lua code was
misbehaving, e.g. by performing expensive library calls. Since 3.1 with
commit 365ee28510 ("BUG/MINOR: hlua: prevent LJMP in hlua_traceback()"),
it appears that the approach is more fragile (though that fix addressed
a real issue about out-of-memory), and it's possible to occasionally
observe crashes or CPU loops with "show threads" while running Lua
heavily. While users of "show threads" are rare, the watchdog warnings,
which were also enabled on 3.1, also trigger these issues, which is
even more of a concern.
This patch goes the simple way to address this for now: since the purpose
of the Lua backtrace was to help locate Lua call places upon a panic,
let's only call the backtrace on panic but not in other situations. After
a panic we obviously don't care that the Lua stack might be corrupted
since it's never going to be resumed anyway. This may be relaxed in the
future if a solution is found to reliably produce harmless Lua backtraces.
The commit above was backported to all stable branches, so this patch
will be needed everywhere. However, TAINTED_PANIC only appeared in 2.8,
and given the rarety of this bug before 3.1, it's probably not needed
to make any extra effort to go beyond 2.8.
It's easy enough to test a version for being subject to this issue,
by running the following Lua code:
local function stress(txn)
for _, backend in pairs(core.backends) do
for _, server in pairs(backend.servers) do
local stats = server:get_stats()
end
end
end
core.register_fetches("stress", stress)
in the following config file:
global
stats socket /tmp/haproxy.stat level admin mode 666
tune.lua.bool-sample-conversion normal
lua-load-per-thread "stress.lua"
listen stress
bind :8001
mode http
timeout client 5s
timeout server 5s
timeout connect 5s
http-request return status 200 content-type text/plain lf-string %[lua.stress()]
server s1 127.0.0.1:8000
and stressing port 8001 with 100+ connections requesting / in loop, then
issuing "show threads" on the CLI using socat in loops as well. Normally
it instantly segfaults (sometimes during the first "show").
A defaults proxy instance may be move into the orphaned list when it is
replaced by a newer section with the same name. This is attached via
<next> member as a single linked list entry. However, proxy free does
not clear <next> attach point.
This causes a crash on deinit if orphaned list is not empty. First, all
frontend/backend instances are freed. This triggers the release of every
referenced defaults instances as their refcount reach zero, but orphaned
list is not clean up. A loop is then conducted on orphaned list via
proxy_destroy_all_unref_defaults(). This causes a segfault due to access
on already freed entries.
To fix this, this patch extends proxy_destroy_defaults(). If orphaned
list is not empty, a loop is performed to remove a possible entry of the
currently released defaults instance. This ensures that loop over
orphaned list won't be able to access to already freed entries.
This bug is pretty rare as it requires to have duplicate name in
defaults sections, and also to use settings which forces defaults
referencing, such as TCP/HTTP rules. This can be reproduced with the
minimal config here :
defaults def
http-request return status 200
frontend fe
bind :20080
defaults def
Note that in fact orphaned list looping is not strictly necessary, as
defaults instances are automatically removed via refcounting. This will
be the purpose of a future patch. However, to limit the risk of
regression on stable releases during backport, this patch uses the more
direct approach for now.
This must be backported up to 3.1.
Since commit eb5279b15 ("BUG/MEDIUM: ssl: fix generate-certificates
option when SNI greater than 64bytes") the LibreSSL job does not seem to
work anymore.
Indeed the reg-tests was modified to add a SNI longer than 64 bytes,
without any concern about the DNS standard, which allows only 63 bytes
per label.
LibreSSL is stricter than the other libraries about that, and checks
that the SNI is compliant with the DNS RFC in the
tlsext_sni_is_valid_hostname() function
https://github.com/libressl/openbsd/blob/OPENBSD_7_8/src/lib/libssl/ssl_tlsext.c#L710
This patch fixes the issue by splitting the SNI with a second label to
reach more than 64 bytes.
Must be backported with eb5279b15 in every stable branches.
When a QCS instance is fully closed on qcs_close_remote() invokation, it
is moved into purg_list for later cleanup. This reuses <el_send> list
element, so a BUG_ON() ensures that QCS is not already present in
send_list.
This code is safe for bidirectional streams, as local channel is only
closed after FIN or RESET_STREAM emission completion, so such QCS won't
be present in the send_list on full closure.
However, things are different for remote uni streams. As such streams do
not have any local channel, qcs_close_remote() will always proceed to
full closure. Most of the time this is fine, but the aformentionned
BUG_ON() could be triggered if emission is required on a remote uni
stream : this only happens after read was aborted and a STOP_SENDING
frame is prepared.
Fix this by adding an extra operation in qcs_close_remote() : on full
close, STOP_SENDING is cancelled if it was prepared and the QCS instance
is removed from send_list. This is safe as STOP_SENDING is unnecessary
after the remote channel is closed. This operation is performed before
purg_list insertion which prevents the BUG_ON() crash issue.
This patch must be backported up to 3.1.
The problem is that the certificate is generated with a CN greater than
64 bytes when the SNI is too long, which is not suppose to be supported,
and will end up with a handshake failure.
The patch fixes the issue by avoiding to add a CN when the SNI is longer than
64 bytes. Indeed this is not a mandatory field anymore and was deprecated more
than 20 years ago. The SAN DNS is enough for this case.
Must be backported in every stable branches.
It was reported by Przemyslaw Bromber that using the "generate-certificates"
option combined with AWS-LC would crash HAProxy when a request is done with a
SNI longer than 64 bytes.
The problem is that the certificate is generated with a CN greater than 64
bytes which results in ssl_sock_do_create_cert() returning NULL. This
NULL value being passed to SSL_set_SSL_CTX.
With OpenSSL, passing a NULL SSL_CTX does not seem to be an issue as it
would just ignore it.
With AWS_LC, passing a NULL seems to crash the function. This was
reported to upstream AWS-LC and fixed in patch 7487ad1dcd8
https://github.com/aws/aws-lc/pull/2946.
This must be backported in every branches.
Add documentation for the dump_all_vars() sample fetch function in the
configuration manual. This function was introduced in the previous commit
to dump all variables in a given scope with optional prefix filtering.
The documentation includes:
- Function signature and return type
- Description of output format
- Explanation of scope and prefix arguments
- Usage examples for common scenarios
This completes the implementation of GitHub issue #1623.
This patch implements dump_all_vars([scope],[prefix]) sample fetch
function that dumps all variables in a given scope, optionally
filtered by name prefix.
Output format: var1=value1, var2=value2, ...
- String values are quoted and escaped (", , \r, \n, \b, \0)
- All sample types are supported via sample_convert()
- Scope can be: sess, txn, req, res, proc
- Prefix filtering is optional
Example usage:
http-request return string %[dump_all_vars(txn)]
http-request return string %[dump_all_vars(txn,user)]
This addresses GitHub issue #1623.
Currently, variable names are only used during parsing and are not
stored at runtime. This makes it impossible to iterate through
variables and retrieve their names.
This patch adds infrastructure to store variable names:
- Add 'name' and 'name_len' fields to var_desc structure
- Add 'name' field to var structure
- Add VDF_NAME_ALLOCATED flag to track memory ownership
- Store names in vars_fill_desc(), var_set(), vars_check_arg(),
and parse_store()
- Free names in var_clear() and release_store_rule()
- Add ARGT_VAR handling in release_sample_arg() to free the
allocated name when the flag is set
This prepares the ground for implementing dump_all_vars() in the
next commit.
Tested with:
- ASAN-enabled build on Linux (TARGET=linux-glibc USE_OPENSSL=1
ARCH_FLAGS="-g -fsanitize=address")
- Regression tests: reg-tests/sample_fetches/vars.vtc
- Regression tests: reg-tests/startup/default_rules.vtc
This function takes a string appends it to a buffer in a format
compatible with most languages (double-quoted, with special characters
escaped). It handles standard escape sequences like \n, \r, \", \\.
This generic utility is desined to be used for logging or debugging
purposes where arbitrary string data needs to be safely emitted without
breaking the output format. It will be primarily used by the upcoming
dump_all_vars() sample fetch to dump variable contents safely.
This patch move the peers section parsing code from src/cfgparse.c to a
dedicated src/cfgparse-peers.c file. This seperation improves code
organization and prepares for further refactoring of the "peers" keyword
registration system.
No functional changes in this patch - the code is moved as-is with only
the necessary adjustments for compliation (adding SPDX header and
updating Makefile for build).
This is the first patch in a series to address issue #3221, which
reports that "peers" section keywords are not displayed with -dKall.
s/"no'/"no"
s/'private"/"private"
s/"flt'/"flt"
There isn't definite convention but people usually prefer to highlight
something important with quotation marks. For example, it's convenient
to find keywords from a text when they are quoted, mismatches make this
harder.
No backport needed.
This fixes several obvious typos in the documentation:
s/elvoved/evolved
s/performend/performed
s/importnat/important
s/sharedd/shared
s/eveyone/everyone
No backport needed.