Commit Graph

81 Commits

Author SHA1 Message Date
Maxime Henrion
bc8e14ec23 CLEANUP: use the automatic alignment feature
- Use the automatic alignment feature instead of hardcoding 64 all over
  the code.
- This also converts a few bare __attribute__((aligned(X))) to using the
  ALIGNED macro.
2025-12-09 17:14:58 +01:00
Willy Tarreau
675c86c4aa DEBUG: add BUG_ON_STRESS(): a BUG_ON() implemented only when DEBUG_STRESS > 0
The purpose of this new BUG_ON is beyond BUG_ON_HOT(). While BUG_ON_HOT()
is meant to be light but placed on very hot code paths, BUG_ON_STRESS()
might be heavy and only used under stress-testing, to try to detect early
that something bad is starting to happen. This one is not even type-checked
when not defined because we don't want to risk the compiler emitting the
slightest piece of code there in production mode, so as to give enough
freedom to the developers.
2025-11-14 16:42:53 +01:00
Willy Tarreau
3d441e78e5 DEBUG: extend DEBUG_STRESS to ease testing and turn on extra checks
DEBUG_STRESS is currently used only to expose "stress-level". With this
patch, we go a bit further, by automatically forcing DEBUG_STRICT and
DEBUG_STRICT_ACTION to their highest values in order to enable all
BUG_ON levels, and make all of them result in a crash. In addition,
care is taken to always only have 0 or 1 in the macro, so that it can be
tested using "#if DEBUG_STRESS > 0" as well as "if (DEBUG_STRESS) { }"
everywhere.

The goal will be to ease insertion of extra tests for builds dedicated
to stress-testing that enable possibly expensive extra checks on certain
code paths that cannot reasonably be compiled in for production code
right now.
2025-11-14 16:38:04 +01:00
Willy Tarreau
33d72568dd MINOR: tools: also implement ha_aligned_alloc_typed()
This one is a macro and will allocate a properly aligned and sized
object. This will help make sure that the alignment promised to the
compiler is respected.

When memstats is used, the type name is passed as a string into the
.extra field so that it can be displayed in "debug dev memstats". Two
tiny mistakes related to memstats macros were also fixed (calloc
instead of malloc for zalloc), and the doc was also added to document
how to use these calls.
2025-08-13 17:37:08 +02:00
Willy Tarreau
746e77d000 MINOR: tools: implement ha_aligned_zalloc()
This one is exactly ha_aligned_alloc() followed by a memset(0), as
it will be convenient for a number of call places as a replacement
for calloc().

Note that ideally we should also have a calloc version that performs
basic multiply overflow checks, but these are essentially used with
numbers of threads times small structs so that's fine, and we already
do the same everywhere in malloc() calls.
2025-08-11 19:55:30 +02:00
Willy Tarreau
325d1bdcca MINOR: implement ha_aligned_alloc() to return aligned memory areas
We have two versions, _safe() which verifies and adjusts alignment,
and the regular one which trusts the caller. There's also a dedicated
ha_aligned_free() due to mingw.

The currently detected OSes are mingw, unixes older than POSIX 200112
which require memalign(), and those post 200112 which will use
posix_memalign(). Solaris 10 reports 200112 (probably through
_GNU_SOURCE since it does not do it by default), and Solaris 11 still
supports memalign() so for all Solaris we use memalign(). The memstats
wrappers are also implemented, and have the exported names. This was
the opportunity for providing a separate free call that lets the caller
specify the size (e.g. for use with pools).

For now this code is not used.
2025-08-06 19:19:27 +02:00
Willy Tarreau
0a91c6dcae BUILD: debug: mark ha_crash_now() as attribute(noreturn)
Building on MIPS64 with clang16 incorrectly reports some uninitialized
value warnings in stats-proxy.c due to some calls to ABORT_NOW() where
the compiler didn't know the code wouldn't return. Let's properly mark
the function as noreturn, and take this opportunity for also marking it
unused to avoid possible warnings depending on the build options (if
ABORT_NOW is not used). No backport needed though it will not harm.
2025-05-16 16:43:53 +02:00
Willy Tarreau
b708345c17 DEBUG: counters: add the ability to enable/disable updating the COUNT_IF counters
These counters can have a noticeable cost on large machines, though not
dramatic. There's no single good choice to keep them enabled or disabled.
This commit adds multiple choices:
  - DEBUG_COUNTERS set to 2 will automatically enable them by default, while
    1 will disable them by default
  - the global "debug.counters on/off" will allow to change the setting at
    boot, regardless of DEBUG_COUNTERS as long as it was at least 1.
  - the CLI "debug counters on/off" will also allow to change the value at
    run time, allowing to observe a phenomenon while it's happening, or to
    disable counters if it's suspected that their cost is too high

Finally, the "debug counters" command will append "(stopped)" at the end
of the CNT lines when these counters are stopped.

Not that the whole mechanism would easily support being extended to all
counter types by specifying the types to apply to, but it doesn't seem
useful at all and would require the user to also type "cnt" on debug
lines. This may easily be changed in the future if it's found relevant.
2025-04-14 19:02:13 +02:00
Willy Tarreau
a142adaba0 DEBUG: counters: make COUNT_IF() only appear at DEBUG_COUNTERS>=1
COUNT_IF() is convenient but can be heavy since some of them were found
to trigger often (roughly 1 counter per request on avg). This might even
have an impact on large setups due to the cost of a shared cache line
bouncing between multiple cores. For now there's no way to disable it,
so let's only enable it when DEBUG_COUNTERS is 1 or above. A future
change will make it configurable.
2025-04-14 19:02:13 +02:00
Willy Tarreau
61d633a3ac DEBUG: rename DEBUG_GLITCHES to DEBUG_COUNTERS and enable it by default
Till now the per-line glitches counters were only enabled with the
confusingly named DEBUG_GLITCHES (which would not turn glitches off
when disabled). Let's instead change it to DEBUG_COUNTERS and make sure
it's enabled by default (though it can still be disabled with
-DDEBUG_GLITCHES=0 just like for DEBUG_STRICT). It will later be
expanded to cover more counters.
2025-04-14 19:02:13 +02:00
Willy Tarreau
7b6acb6a51 MINOR: bug: make BUG_ON() fall back to ASSUME
When the strict level is zero and BUG_ON() is not implemented, some
possible null-deref warnings are emitted again because some were
covering for these cases. Let's make it fall back to ASSUME() so that
the compiler continues to know that the tested expression never happens.
It also allows to further optimize certain functions by helping the
compiler eliminate certain tests for impossible values. However it
requires that the expression is really evaluated before passing the
result through ASSUME() otherwise it was shown that gcc-11 and above
will fail to evaluate its implications and will continue to emit the
null-deref warnings in case the expression is non-trivial (e.g. it
has multiple terms).

We don't do it for BUG_ON_HOT() however because the extra cost of
evaluating the condition is generally not welcome in fast paths,
particularly when that BUG_ON_HOT() was kept disabled for
performance reasons.
2024-12-17 17:39:12 +01:00
Willy Tarreau
d6dc8120c0 BUILD: debug: fix build issues in COUNT_IF() with -Wunused-value
Commit 7f64bb79fd ("BUG/MINOR: debug: COUNT_IF() should return true/false")
allowed the COUNT_IF() macro to return the evaluated value. This is handy
to place it in "if ()" conditions and count them at the same time. When
glitches are disabled, the condition is just returned as-is, but most call
places do not use the result, making some compilers complain. In addition,
while reviewing this, it was noticed that when DEBUG_STRICT=0, the macro
would still be replaced by a "do { } while (0)" statement, which not only
does not evaluate the expression, but also cannot return anything. Ditto
for COUNT_IF_HOT().

Let's make sure both are always properly evaluated now.
2024-12-09 18:04:51 +01:00
Willy Tarreau
7f64bb79fd BUG/MINOR: debug: COUNT_IF() should return true/false
The COUNT_IF() macro was initially meant to return true/false to be used
in if() conditions but had an extra do { } while(0) that prevents it from
doing so. Let's get rid of the do { } while(0) before the code generalizes
to too many places. There's no impact on existing code, but may have to be
backported if future fixes rely on it.
2024-12-06 18:45:46 +01:00
Willy Tarreau
502790ed7e MINOR: debug: add a new counter type for glitches
COUNT_GLITCH() will implement an unconditional counter on its declaration
line when DEBUG_GLITCHES is set, and do nothing otherwise. The output will
be reported as "GLT" and can be filtered as "glt" on the CLI. The purpose
is to help figure what's happening if some glitches counters start going
through the roof. The macro supports an optional string argument to
describe the cause of the glitch (e.g. "truncated header"), which is then
reported in the dump.

For now this is conditioned by DEBUG_GLITCHES but if it turns out to be
light enough, maybe we'll keep it enabled full time. In this case it
might have to be moved away from debug dev, or at least documented (or
done as debug counters maybe so that dev can remain undocumented and
updatable within a branch?).
2024-11-14 08:49:38 +01:00
Willy Tarreau
e119095290 MINOR: debug: explicitly permit the counter condition to be empty
In order to count new event types, we'll need to support empty conditions
so that we don't have to fake if (1) that would pollute the output. This
change checks if #cond is an empty string before concatenating it with
the optional var args, and avoids dumping the colon on the dump if the
whole description is empty.
2024-11-14 08:47:00 +01:00
Willy Tarreau
148eb5875f DEBUG: wdt: better detect apparently locked up threads and warn about them
In order to help users detect when threads are behaving abnormally, let's
try to emit a warning when one is no longer making any progress. This will
allow to catch faulty situations more accurately, instead of occasionally
triggering just after the long task. It will also let users know that there
is something wrong with their configuration, and inspect the call trace to
figure whether they're using excessively long rules or Lua for example (the
usual warnings about lua-load vs lua-load-per-thread are still reported).

The warning will only be emitted for threads not yet marked as stuck so
as not to interfere with panic dumps and avoid sending a warning just
before a panic. A tainted flag is set when this happens however (0x2000).
2024-11-06 18:35:42 +01:00
Willy Tarreau
da66c42f65 MINOR: debug: add a new debug macro COUNT_IF()
This macro works exactly like BUG_ON() except that it never logs anything
nor crashes, it only implements an atomic counter that is incremented on
every call. This can be used to count a number of unlikely events that are
worth checking at run time on setups showing unusual and unreproducible
behaviors.
2024-10-21 19:14:07 +02:00
Willy Tarreau
776fd03509 MEDIUM: debug: add match counters for BUG_ON/WARN_ON/CHECK_IF
These macros do not always kill the process, and sometimes it would be
nice to know if some match or not, and how many times (especially for the
CHECK_IF one).

This commit adds a new section "dbg_cnt" made of structs that contain
function name, file name, line number, check type, condition and match
count. A newe macro __DBG_COUNT() adds one to the counter, and is placed
inside _BUG_ON() and _BUG_ON_ONCE(). It's worth noting that the exact
type of the check is not very precise but in practice we don't care,
as most checks will cause the process to die anyway unless they're of
type _BUG_ON_ONCE() (used by CHECK_IF by default).

All of this is limited to !defined(USE_OBSOLETE_LINKER) because we're
creating a section, thus we need a modern linker to be able to scan
this section later. Doing so adds ~50kB to the executable due to the
~1266 BUG_ON() and others placed there. That's not huge in comparison
to the visibility it can provide.
2024-10-21 19:14:07 +02:00
Willy Tarreau
8844ed2009 CLEANUP: debug: make the BUG_ON() macros check the condition in the outer one
The BUG_ON() macros are made of two levels so as to resolve the condition
to a string. However this doesn't offer much flexibility for performing
other operations when the condition is validated, so let's adjust them so
that the condition is checked in the outer macro and the operations are
performed in the inner one.
2024-10-21 18:17:25 +02:00
Willy Tarreau
8e048603d1 MINOR: debug: make mark_tainted() return the previous value
Since mark_tainted() uses atomic ops to update the tainted status, let's
make it return the prior value, which will allow the caller to detect
if it's the first one to set it or not.
2024-10-19 15:13:47 +02:00
Frederic Lecaille
bc9821fd26 BUILD: Missing inclusion header for ssize_t type
Compilation issue detected as follows by gcc:

In file included from src/ncbuf.c:19:
src/ncbuf.c: In function 'ncb_write_off':
include/haproxy/bug.h:144:10: error: unknown type name 'ssize_t'
  144 |   extern ssize_t write(int, const void *, size_t); \
2024-06-26 10:17:09 +02:00
Willy Tarreau
2d27c80288 BUILD: debug: also declare strlen() in __ABORT_NOW()
Previous commit 8f204fa8ae ("MINOR: debug: print gdb hints when crashing")
broken on the CI where strlen() isn't known. Let's forward-declare it in
the __ABORT_NOW() functions, just like write(). No backport is needed.
2024-06-26 08:04:40 +02:00
Willy Tarreau
8f204fa8ae MINOR: debug: print gdb hints when crashing
To make bug reporting easier for users, when crashing, let's suggest
what to do. Typically when a BUG_ON() matches, only the current thread
is useful the vast majority of the time, while when the watchdog
triggers, all threads are interesting.

The messages are printed at the end after the dump. We may adjust these
with wiki links in the future is more detailed instructions are relevant.
2024-06-26 07:43:00 +02:00
Willy Tarreau
e791b243f0 BUG/MINOR: debug: make sure DEBUG_STRICT=0 does work as documented
Setting DEBUG_STRICT=0 only validates the defined(DEBUG_STRICT) test
regarding DEBUG_STRICT_ACTION, which is equivalent to DEBUG_STRICT>=0.
Let's make sure the test checks for >0 so that DEBUG_STRICT=0 properly
disables DEBUG_STRICT.
2024-04-11 16:41:08 +02:00
Willy Tarreau
0a0041d195 BUILD: tree-wide: fix a few missing includes in a few files
Some include files, mostly types definitions, are missing a few includes
to define the types they're using, causing include ordering dependencies
between files, which are most often not seen due to the alphabetical
order of includes. Let's just fix them.

These were spotted by building pre-compiled headers for all these files
to .h.gch.
2024-03-05 11:50:34 +01:00
Willy Tarreau
25968c186a MINOR: debug: add an optional message argument to the BUG_ON() family
This commit adds support for an optional second argument to BUG_ON(),
WARN_ON(), CHECK_IF(), that can be a constant string. When such an
argument is given, it will be printed on a second line after the
existing first message that contains the condition.

This can be used to provide more human-readable explanations about
what happened, such as "too low on memory" or "memory corruption
detected" that may help a user resolve the incident by themselves.
2024-02-05 17:09:00 +01:00
Willy Tarreau
d417863828 MINOR: debug: support passing an optional message in ABORT_NOW()
The ABORT_NOW() macro is not much used since we have BUG_ON(), but
there are situations where it makes sense, typically if the program
must always die regardless od DEBUG_STRICT, or if the condition must
always be evaluated (e.g. decompress something and check it).

It's not convenient not to have any hint about what happened there. But
providing too much info also results in wiping some registers, making
the trace less exploitable, so a compromise must be found.

What this patch does is to provide the support for an optional argument
to ABORT_NOW(). When an argument is passed (a string), then a message
will be emitted with the file name, line number, the message and a
trailing LF, before the stack dump and the crash. It should be used
reasonably, for example in functions that have multiple calls that need
to be more easily distinguished.
2024-02-05 17:09:00 +01:00
Willy Tarreau
bc70b385fd MINOR: debug: make BUG_ON() catch build errors even without DEBUG_STRICT
As seen in previous commit 59acb27001 ("BUILD: quic: Variable name typo
inside a BUG_ON()."), it can sometimes happen that with DEBUG forced
without DEBUG_STRICT, BUG_ON() statements are ignored. Sadly, it means
that typos there are not even build-tested.

This patch makes these statements reference sizeof(cond) to make sure
the condition is parsed. This doesn't result in any code being emitted,
but makes sure the expression is correct so that an issue such as the one
above will fail to build (which was verified).

This may be backported as it can help spot failed backports as well.
2024-02-05 15:09:37 +01:00
Aurelien DARRAGON
be0165b249 BUILD: debug: remove leftover parentheses in ABORT_NOW()
Since d480b7b ("MINOR: debug: make ABORT_NOW() store the caller's line
number when using abort"), building with 'DEBUG_USE_ABORT' fails with:

  |In file included from include/haproxy/api.h:35,
  |                 from include/haproxy/activity.h:26,
  |                 from src/ev_poll.c:20:
  |include/haproxy/thread.h: In function ‘ha_set_thread’:
  |include/haproxy/bug.h:107:47: error: expected ‘;’ before ‘_with_line’
  |  107 | #define ABORT_NOW() do { DUMP_TRACE(); abort()_with_line(__LINE__); } while (0)
  |      |                                               ^~~~~~~~~~
  |include/haproxy/bug.h:129:25: note: in expansion of macro ‘ABORT_NOW’
  |  129 |                         ABORT_NOW();                                    \
  |      |                         ^~~~~~~~~
  |include/haproxy/bug.h:123:9: note: in expansion of macro ‘__BUG_ON’
  |  123 |         __BUG_ON(cond, file, line, crash, pfx, sfx)
  |      |         ^~~~~~~~
  |include/haproxy/bug.h:174:30: note: in expansion of macro ‘_BUG_ON’
  |  174 | #  define BUG_ON(cond)       _BUG_ON     (cond, __FILE__, __LINE__, 3, "FATAL: bug ",     "")
  |      |                              ^~~~~~~
  |include/haproxy/thread.h:201:17: note: in expansion of macro ‘BUG_ON’
  |  201 |                 BUG_ON(!thr->ltid_bit);
  |      |                 ^~~~~~
  |compilation terminated due to -Wfatal-errors.
  |make: *** [Makefile:1006: src/ev_poll.o] Error 1

This is because of a leftover: abort()_with_line(__LINE__);
                                    ^^
Fixing it by removing the extra parentheses after 'abort' since the
abort() call is now performed under abort_with_line() helper function.

This was raised by Ilya in GH #2440.

No backport is needed, unless the above commit gets backported.
2024-02-05 14:55:04 +01:00
Willy Tarreau
d480b7be96 MINOR: debug: make ABORT_NOW() store the caller's line number when using abort
Placing DO_NOT_FOLD() before abort() only works in -O2 but not in -Os which
continues to place only 5 calls to abort() in h3.o for call places. The
approach taken here is to replace abort() with a new function that wraps
it and stores the line number in the stack. This slightly increases the
code size (+0.1%) but when unwinding a crash, the line number remains
present now. This is a very low cost, especially if we consider that
DEBUG_USE_ABORT is almost only used by code coverage tools and occasional
debugging sessions.
2024-02-02 17:12:06 +01:00
Willy Tarreau
2bb192ba91 MINOR: debug: make sure calls to ha_crash_now() are never merged
As indicated in previous commit, we don't want calls to ha_crash_now()
to be merged, since it will make gdb return a wrong line number. This
was found to happen with gcc 4.7 and 4.8 in h3.c where 26 calls end up
as only 5 to 18 "ud2" instructions depending on optimizations. By
calling DO_NOT_FOLD() just before provoking the trap, we can reliably
avoid this folding problem. Note that this does not address the case
where abort() is used instead (DEBUG_USE_ABORT).
2024-02-02 17:12:06 +01:00
Willy Tarreau
96bb99a87d DEBUG: pools: detect that malloc_trim() is in progress
Now when calling ha_panic() with a thread still under malloc_trim(),
we'll set a new tainted flag to easily report it, and the output
trace will report that this condition happened and will suggest to
use no-memory-trimming to avoid it in the future.
2023-10-25 15:48:02 +02:00
Willy Tarreau
26a6481f00 DEBUG: lua: add tainted flags for stuck Lua contexts
William suggested that since we can detect the presence of Lua in the
stack, let's combine it with stuck detection to set a new pair of flags
indicating a stuck Lua context and a stuck Lua shared context.

Now, executing an infinite loop in a Lua sample fetch function with
yield disabled crashes with tainted=0xe40 if loaded from a lua-load
statement, or tainted=0x640 from a lua-load-per-thread statement.

In addition, at the end of the panic dump, we can check if Lua was
seen stuck and emit recommendations about lua-load-per-thread and
the choice of dependencies depending on the presence of threads
and/or shared context.
2023-10-25 15:48:02 +02:00
Willy Tarreau
46bbb3a33b DEBUG: add a tainted flag when ha_panic() is called
This will make it easier to know that the panic function was called,
for the occasional case where the dump crashes and/or the stack is
corrupted and not much exploitable. Now at least it will be sufficient
to check the tainted value to know that someone called ha_panic(), and
it will also be usable to condition extra analysis.
2023-10-25 15:48:02 +02:00
Willy Tarreau
17a7baca07 BUILD: bug: make BUG_ON() void to avoid a rare warning
When building without threads, the recently introduced BUG_ON(tid != 0)
turns to a constant expression that evaluates to 0 and that is not used,
resulting in this warning:

  src/connection.c: In function 'conn_free':
  src/connection.c:584:3: warning: statement with no effect [-Wunused-value]

This is because the whole thing is declared as an expression for clarity.
Make it return void to avoid this. No backport is needed.
2023-09-04 19:38:51 +02:00
Willy Tarreau
543e2544ca DEBUG: crash using an invalid opcode on aarch64 instead of an invalid access
On aarch64 there's also a guaranted invalid instruction, called UDF, and
which even supports an optional 16-bit immediate operand:

   https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/UDF--Permanently-Undefined-?lang=en

It's conveniently encoded as 4 zeroes (when the operand is zero). It's
unclear when support for it was added into GAS, if at all; even a
not-so-old 2.27 doesn't know about it. Let's byte-encode it.

Tested on an A72 and works as expected.
2023-04-25 19:53:39 +02:00
Willy Tarreau
77787ec9bc DEBUG: crash using an invalid opcode on x86/x86_64 instead of an invalid access
BUG_ON() calls currently trigger a segfault. This is more convenient
than abort() as it doesn't rely on any function call nor signal handler
and never causes non-unwindable stacks when opening cores. But it adds
quite some confusion in bug reports which are rightfully tagged "segv"
and do not instantly allow to distinguish real segv (e.g. null derefs)
from code asserts.

Some CPU architectures offer various crashing methods. On x86 we have
INT3 (0xCC), which stops into the debugger, and UD0/UD1/UD2. INT3 looks
appealing but for whatever reason (maybe signal handling somewhere) it
loses the last call point in the stack, making backtraces unusable. UD2
has the merit of being only 2 bytes and causing an invalid instruction,
which almost never happens normally, so it's easily distinguishable.
Here it was defined as a macro so that the line number in the core
matches the one where the BUG_ON() macro is called, and the debugger
shows the last frame exactly at its calligg point.

E.g. when calling "debug dev bug":

Program terminated with signal SIGILL, Illegal instruction.
  #0  debug_parse_cli_bug (args=<optimized out>, payload=<optimized out>, appctx=<optimized out>, private=<optimized out>) at src/debug.c:408
  408             BUG_ON(one > zero);
  [Current thread is 1 (Thread 0x7f7a660cc1c0 (LWP 14238))]
  (gdb) bt
  #0  debug_parse_cli_bug (args=<optimized out>, payload=<optimized out>, appctx=<optimized out>, private=<optimized out>) at src/debug.c:408
  #1  debug_parse_cli_bug (args=<optimized out>, payload=<optimized out>, appctx=<optimized out>, private=<optimized out>) at src/debug.c:402
  #2  0x000000000061a69f in cli_parse_request (appctx=appctx@entry=0x181c0160) at src/cli.c:832
  #3  0x000000000061af86 in cli_io_handler (appctx=0x181c0160) at src/cli.c:1035
  #4  0x00000000006ca2f2 in task_run_applet (t=0x181c0290, context=0x181c0160, state=<optimized out>) at src/applet.c:449
2023-04-25 18:51:10 +02:00
Willy Tarreau
7f2b3f9431 BUILD: bug.h: add a warning in the base API when unsafe functions are used
Once in a while we introduce an sprintf() or strncat() function by
accident. These ones are particularly dangerous and must never ever
be used because the only way to use them safely is at least as
complicated if not more, than their safe counterparts. By redefining
a few of these functions with an attribute_warning() we can deliver a
message to the developer who is tempted to use them. This commit does
it for strcat(), strcpy(), strncat(), sprintf(), vsprintf(). More could
come later if needed, such as strtok() and maybe a few others, but these
are less common.
2023-04-07 18:21:36 +02:00
Willy Tarreau
1751db140a MINOR: pools: report a replaced memory allocator instead of just malloc_trim()
Instead of reporting the inaccurate "malloc_trim() support" on -vv, let's
report the case where the memory allocator was actively replaced from the
one used at build time, as this is the corner case we want to be cautious
about. We also put a tainted bit when this happens so that it's possible
to detect it at run time (e.g. the user might have inherited it from an
environment variable during a reload operation).

The now unused is_trim_enabled() function was finally dropped.
2023-03-22 18:05:02 +01:00
Willy Tarreau
eedcea8b90 BUILD: debug: remove unnecessary quotes in HA_WEAK() calls
HA_WEAK() is supposed to take a symbol in argument, not a string, since
the asm statements it produces already quote the argument. Having it
quoted twice doesn't work on older compilers and was the only reason
why DEBUG_MEM_STATS didn't work on older compilers.
2022-11-14 11:12:49 +01:00
Willy Tarreau
d96d214b4c CLEANUP: debug: use struct ha_caller for memstat
The memstats code currently defines its own file/function/line number,
type and extra pointer. We don't need to keep them separate and we can
easily replace them all with just a struct ha_caller. Note that the
extra pointer could be converted to a pool ID stored into arg8 or
arg32 and be dropped as well, but this would first require to define
IDs for pools (which we currently do not have).
2022-09-08 14:19:15 +02:00
Willy Tarreau
7f2f1f294c MINOR: debug: add struct ha_caller to describe a calling location
The purpose of this structure is to assemble all constant parts of a
generic calling point for a specific event. These ones are created by
the compiler as a static const element outside of the code path, so
they cost nothing in terms of CPU, and a pointer to that descriptor
can be passed to the place that needs it. This is very similar to what
is being done for the mem_stat stuff.

This will be useful to simplify and improve DEBUG_TASK.
2022-09-08 14:19:15 +02:00
Willy Tarreau
d8009a1ca6 BUILD: debug: make sure debug macros are never empty
As outlined in commit f7ebe584d7 ("BUILD: debug: Add braces to if
statement calling only CHECK_IF()"), the BUG_ON() family of macros
is incorrectly defined to be empty when debugging is disabled, and
that can lead to trouble. Make sure they always fall back to the
usual "do { } while (0)". This may be backported to 2.6 if needed,
though no such issue was met there to date.
2022-08-31 10:53:53 +02:00
Willy Tarreau
db3716b8db MINOR: debug/memstats: permit to pass the size to free()
Right now the free() call is not intercepted since all this is done
using macros and that would break a lot of stuff. Instead a __free()
macro was provided but never used. In addition it used to only report
a zero size, which is not very convenient.

With this patch comes a better solution. Instead it provides a new
will_free() macro that can be prepended before a call to free(). It
only keeps the counters up to date, and also supports being passed a
size. The pool_free_area() command now uses it, which finally allows
the stats to look correct:

  pool-os.h:38   MALLOC  size:   5802127832  calls:   3868044  size/call:   1500
  pool-os.h:47     FREE  size:   5800041576  calls:   3867444  size/call:   1499

The few other places directly calling free() could now be instrumented to
use this and to pass the correct sizeof() when known.
2022-08-09 09:11:27 +02:00
Willy Tarreau
17200dd1f3 MINOR: debug: also store the function name in struct mem_stats
The calling function name is now stored in the structure, and it's
reported when the "all" argument is passed. The first column is
significantly enlarged because some names are really wide :-(
2022-08-09 08:42:42 +02:00
Willy Tarreau
55c950baa9 MINOR: debug: store and report the pool's name in struct mem_stats
Let's add a generic "extra" pointer to the struct mem_stats to store
context-specific information. When tracing pool_alloc/pool_free, we
can now store a pointer to the pool, which allows to report the pool
name on an extra column. This significantly improves tracing
capabilities.

Example:

  proxy.c:1598      CALLOC   size:  28832  calls:  4     size/call:  7208
  dynbuf.c:55       P_FREE   size:  32768  calls:  2     size/call:  16384  buffer
  quic_tls.h:385    P_FREE   size:  34008  calls:  1417  size/call:  24     quic_tls_iv
  quic_tls.h:389    P_FREE   size:  34008  calls:  1417  size/call:  24     quic_tls_iv
  quic_tls.h:554    P_FREE   size:  34008  calls:  1417  size/call:  24     quic_tls_iv
  quic_tls.h:558    P_FREE   size:  34008  calls:  1417  size/call:  24     quic_tls_iv
  quic_tls.h:562    P_FREE   size:  34008  calls:  1417  size/call:  24     quic_tls_iv
  quic_tls.h:401    P_ALLOC  size:  34080  calls:  1420  size/call:  24     quic_tls_iv
  quic_tls.h:403    P_ALLOC  size:  34080  calls:  1420  size/call:  24     quic_tls_iv
  xprt_quic.c:4060  MALLOC   size:  45376  calls:  5672  size/call:  8
  quic_sock.c:328   P_ALLOC  size:  46440  calls:  215   size/call:  216    quic_dgram
2022-08-09 08:26:59 +02:00
Willy Tarreau
4bc37086b8 MINOR: debug: make the mem_stats section aligned to void*
Not specifying the alignment will let the linker choose it, and it turns
out that it will not necessarily be the same size as the one chosen for
struct mem_stats, as can be seen if any new fields are added there. Let's
enforce an alignment to void* both for the section and for the structure.
2022-08-09 08:09:24 +02:00
Willy Tarreau
481edaceb8 BUILD: debug: silence warning on gcc-5
In 2.6, 8a0fd3a36 ("BUILD: debug: work around gcc-12 excessive
-Warray-bounds warnings") disabled some warnings that were reported
around the the BUG() statement. But the -Wnull-dereference warning
isn't known from gcc-5, it only arrived in gcc-6, hence makes gcc-5
complain loudly that it doesn't know this directive. Let's just
condition this one to gcc-6.
2022-07-10 14:13:48 +02:00
Willy Tarreau
27061cd144 MEDIUM: debug: improve DEBUG_MEM_STATS to also report pool alloc/free
Sometimes using "debug dev memstats" can be frustrating because all
pool allocations are reported through pool-os.h and that's all.

But in practice there's nothing wrong with also intercepting pool_alloc,
pool_free and pool_zalloc and report their call counts and locations,
so that's what this patch does. It only uses an alternate set of macroes
for these 3 calls when DEBUG_MEM_STATS is defined. The outputs are
reported as P_ALLOC (for both pool_malloc() and pool_zalloc()) and
P_FREE (for pool_free()).
2022-06-23 11:58:01 +02:00
Willy Tarreau
177aed56dc MEDIUM: debug: detect redefinition of symbols upon dlopen()
In order to better detect the danger caused by extra shared libraries
which replace some symbols, upon dlopen() we now compare a few critical
symbols such as malloc(), free(), and some OpenSSL symbols, to see if
the loaded library comes with its own version. If this happens, a
warning is emitted and TAINTED_REDEFINITION is set. This is important
because some external libs might be linked against different libraries
than the ones haproxy was linked with, and most often this will end
very badly (e.g. an OpenSSL object is allocated by haproxy and freed
by such libs).

Since the main source of dlopen() calls is the Lua lib, via a "require"
statement, it's worth trying to show a Lua call trace when detecting a
symbol redefinition during dlopen(). As such we emit a Lua backtrace if
Lua is detected as being in use.
2022-06-19 17:58:32 +02:00