HAProxy - Load balancer
Find a file
Christopher Faulet 7262433183 BUG/MEDIUM: sock: Remove FD_POLL_HUP during connect() if FD_POLL_ERR is not set
epoll_wait() may return EPOLLUP and/or EPOLLRDHUP after an asynchronous
connect(), to indicate that the peer accepted the connection then
immediately closed before epoll_wait() returned. When this happens,
sock_conn_check() is called to check whether or not the connection correctly
established, and after that the receive channel of the socket is assumed to
already be closed. This lets haproxy send the request at best (if RDHUP and
not HUP) then immediately close.

Over the last two years, there were a few reports about this spuriously
happening on connections where network captures proved that the server had
not closed at all (and sometimes even received the request and responded to
it after haproxy had closed). The logs show that a successful connection is
immediately reported on error after the request was sent. After
investigations, it appeared that a EPOLLUP, or eventually a EPOLLRDHUP, can
be reported by epool_wait() during the connect() but in sock_conn_check(),
the connect() reports a success. So the connection is validated but the HUP
is handled on the first receive and an error is reported.

The same behavior could be observed on health-checks, leading HAProxy to
consider the server as DOWN while it is not.

The only explanation at this point is that it is a kernel bug, notably
because it does not even match the documentation for connect() nor epoll. In
addition for now it was only observed with Ubuntu kernels 5.4 and 5.15 and
was never reproduced on any other one.

We have no reproducer but here is the typical strace observed:

socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 114
fcntl(114, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
setsockopt(114, SOL_TCP, TCP_NODELAY, [1], 4) = 0
connect(114, {sa_family=AF_INET, sin_port=htons(11000), sin_addr=inet_addr("A.B.C.D")}, 16) = -1 EINPROGRESS (Operation now in progress)
epoll_ctl(19, EPOLL_CTL_ADD, 114, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=114, u64=114}}) = 0
epoll_wait(19, [{events=EPOLLIN, data={u32=15, u64=15}}, {events=EPOLLIN, data={u32=151, u64=151}}, {events=EPOLLIN, data={u32=59, u64=59}}, {events=EPOLLIN|EPOLLRDHUP, data={u32=114, u64=114}}], 200, 0) = 4
epoll_ctl(19, EPOLL_CTL_MOD, 114, {events=EPOLLOUT, data={u32=114, u64=114}}) = 0
epoll_wait(19, [{events=EPOLLOUT, data={u32=114, u64=114}}, {events=EPOLLIN, data={u32=15, u64=15}}, {events=EPOLLIN, data={u32=10, u64=10}}, {events=EPOLLIN, data={u32=165, u64=165}}], 200, 0) = 4
connect(114, {sa_family=AF_INET, sin_port=htons(11000), sin_addr=inet_addr("A.B.C.D")}, 16) = 0
sendto(114, "POST "..., 1009, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 1009
close(114)                              = 0

Some ressources about this issue:
  - https://www.spinics.net/lists/netdev/msg876470.html
  - https://github.com/haproxy/haproxy/issues/1863
  - https://github.com/haproxy/haproxy/issues/2368

So, to workaround the issue, we have decided to remove FD_POLL_HUP flag on
the FD during the connection establishement if FD_POLL_ERR is not reported
too in sock_conn_check(). This way, the call to connect() is able to
validate or reject the connection. At the end, if the HUP or RDHUP flags
were valid, either connect() would report the error itself, or the next
recv() would return 0 confirming the closure that the poller tried to
report. EPOLL_RDHUP is only an optimization to save a syscall anyway, and
this pattern is so rare that nobody will ever notice the extra call to
recv().

Please note that at least one reporter confirmed that using poll() instead
of epoll() also addressed the problem, so that can also be a temporary
workaround for those discovering the problem without the ability to
immediately upgrade.

The event is accounted via a COUNT_IF(), to be able to spot it in future
issue. Just in case.

This patch should fix the issue #1863 and #2368. It may be related
to #2751. It should be backported as far as 2.4. In 3.0 and below, the
COUNT_IF() must be removed.
2024-11-27 12:16:25 +01:00
.github CI: github: add 'workflow_dispatch' on remaining build jobs 2024-11-25 14:03:13 +01:00
addons DOC: ot: mention planned deprecation of the OT filter 2024-11-22 16:11:51 +01:00
admin CLEANUP: assorted typo fixes in the code and comments 2024-09-03 17:49:21 +02:00
dev DEV: patchbot: prepare for new version 3.2-dev 2024-11-26 17:24:21 +01:00
doc [RELEASE] Released version 3.2-dev0 2024-11-26 15:33:57 +01:00
examples EXAMPLES: add "traces.cfg" with traces examples 2024-11-06 17:32:32 +01:00
include MINOR: version: this is development again (3.2) 2024-11-26 17:21:16 +01:00
reg-tests REGTESTS: don't rely on the base64 utility when openssl base64 is already used 2024-11-21 21:10:09 +01:00
scripts CI: vtest: temporarily build from the sd-notify PR 2024-11-20 12:07:38 +01:00
src BUG/MEDIUM: sock: Remove FD_POLL_HUP during connect() if FD_POLL_ERR is not set 2024-11-27 12:16:25 +01:00
tests MAJOR: import: update mt_list to support exponential back-off (try #2) 2024-07-09 16:46:38 +02:00
.cirrus.yml CI: cirrus-ci: bump FreeBSD image to 14-1 2024-10-14 14:28:26 +02:00
.gitattributes MINOR: Configure the cpp userdiff driver for *.[ch] in .gitattributes 2021-02-22 18:17:57 +01:00
.gitignore CONTRIB: Add vi file extensions to .gitignore 2023-06-02 18:14:34 +02:00
.mailmap DOC: update Tim's address in .mailmap 2021-09-16 09:14:14 +02:00
.travis.yml MEDIUM: mworker: remove USE_SYSTEMD requirement for -Ws 2024-11-20 12:07:38 +01:00
BRANCHES DOC: fix some spelling issues over multiple files 2021-01-08 14:53:47 +01:00
BSDmakefile BUILD: makefile: commit the tiny FreeBSD makefile stub 2023-05-24 17:17:36 +02:00
CHANGELOG [RELEASE] Released version 3.2-dev0 2024-11-26 15:33:57 +01:00
CONTRIBUTING CLEANUP: assorted typo fixes in the code and comments 2021-08-16 12:37:59 +02:00
INSTALL MINOR: version: this is development again (3.2) 2024-11-26 17:21:16 +01:00
LICENSE LICENSE: add licence exception for OpenSSL 2012-09-07 13:52:26 +02:00
MAINTAINERS MAJOR: spoe: Let the SPOE back into the game 2024-05-22 09:04:38 +02:00
Makefile BUILD: makefile: reorder object files by build time 2024-11-20 18:49:56 +01:00
README.md DOC: change the link to the FreeBSD CI in README.md 2024-06-03 15:21:29 +02:00
SUBVERS BUILD: use format tags in VERDATE and SUBVERS files 2013-12-10 11:22:49 +01:00
VERDATE [RELEASE] Released version 3.1.0 2024-11-26 15:24:10 +01:00
VERSION [RELEASE] Released version 3.2-dev0 2024-11-26 15:33:57 +01:00

HAProxy

alpine/musl AWS-LC openssl no-deprecated Illumos NetBSD FreeBSD VTest

HAProxy logo

HAProxy is a free, very fast and reliable reverse-proxy offering high availability, load balancing, and proxying for TCP and HTTP-based applications.

Installation

The INSTALL file describes how to build HAProxy. A list of packages is also available on the wiki.

Getting help

The discourse and the mailing-list are available for questions or configuration assistance. You can also use the slack or IRC channel. Please don't use the issue tracker for these.

The issue tracker is only for bug reports or feature requests.

Documentation

The HAProxy documentation has been split into a number of different files for ease of use. It is available in text format as well as HTML. The wiki is also meant to replace the old architecture guide.

Please refer to the following files depending on what you're looking for:

  • INSTALL for instructions on how to build and install HAProxy
  • BRANCHES to understand the project's life cycle and what version to use
  • LICENSE for the project's license
  • CONTRIBUTING for the process to follow to submit contributions

The more detailed documentation is located into the doc/ directory:

License

HAProxy is licensed under GPL 2 or any later version, the headers under LGPL 2.1. See the LICENSE file for a more detailed explanation.