OPTIM: proxy: separate queues fields from served

There's still a lot of contention when accessing the backend's totpend and queueslength for every request in may_dequeue_tasks(), even when queues are not used. This only happens because it's stored in the same cache line as >beconn which is being written by other threads: 0.01 | call sess_change_server 0.02 | mov 0x188(%r15),%esi ## s->queueslength | if (may_dequeue_tasks(srv, s->be)) 0.00 | mov 0xa8(%r12),%rax 0.00 | mov -0x50(%rbp),%r11d 0.00 | mov -0x60(%rbp),%r10 0.00 | test %esi,%esi | jne 3349 0.01 | mov 0xa00(%rax),%ecx ## p->queueslength 8.26 | test %ecx,%ecx 4.08 | je 288d This patch moves queueslength and totpend to their own cache line, thus adding 64 bytes to the struct proxy, but gaining 3.6% of RPS on a 64-core EPYC thanks to the elimination of this false sharing. process_stream() goes down from 3.88% to 3.26% in perf top, with the next top users being inc/dec (s->served) and be->beconn.
2026-02-26 08:24:53 +02:00 · 2026-01-28 10:57:25 +00:00
parent 3ca2a83fc0
commit a9df6947b4
1 changed files with 8 additions and 2 deletions
--- a/include/haproxy/proxy-t.h
+++ b/include/haproxy/proxy-t.h
@@ -508,10 +508,16 @@ struct proxy {
 	EXTRA_COUNTERS(extra_counters_be);

 	THREAD_ALIGN();
-	unsigned int queueslength;		/* Sum of the length of each queue */
+	/* these ones change all the time */
 	int served;				/* # of active sessions currently being served */
-	int totpend;				/* total number of pending connections on this instance (for stats) */
 	unsigned int feconn, beconn;		/* # of active frontend and backends streams */
+
+	THREAD_ALIGN();
+	/* these ones are only changed when queues are involved, but checked
+	 * all the time.
+	 */
+	unsigned int queueslength;		/* Sum of the length of each queue */
+	int totpend;				/* total number of pending connections on this instance (for stats) */
 };

 struct switching_rule {