HAProxy - Load balancer
Find a file
Willy Tarreau 1c75995611 BUG/MAJOR: dns: add minimalist error processing on the Rx path
It was reported in bug #399 that the DNS sometimes enters endless loops
after hours working fine. The issue is caused by a lack of error
processing in the DNS's recv() path combined with an exclusive recv OR
send in the UDP layer, resulting in some errors causing CPU loops that
will never stop until the process is restarted.

The basic cause is that the FD_POLL_ERR and FD_POLL_HUP flags are sticky
on the FD, and contrary to a stream socket, receiving an error on a
datagram socket doesn't indicate that this socket cannot be used anymore.
Thus the Rx code must at least handle this situation and flush the error
otherwise it will constantly be reported. In theory this should not be a
big issue but in practise it is due to another bug in the UDP datagram
handler which prevents the send() callback from being called when Rx
readiness was reported, so the situation cannot go away. It happens way
more easily with threads enabled, so that there is no dead time between
the moment the FD is disabled and another recv() is called, such as in
the example below where the request was sent to a closed port on the
loopback provoking an ICMP unreachable to be sent back:

  [pid 20888] 18:26:57.826408 sendto(29, ";\340\1\0\0\1\0\0\0\0\0\1\0031wt\2eu\0\0\34\0\1\0\0)\2\0\0\0\0\0\0\0", 35, 0, NULL, >
  [pid 20893] 18:26:57.826566 recvfrom(29, 0x7f97c54ef2f0, 513, 0, NULL, NULL) = -1 ECONNREFUSED (Connection refused)
  [pid 20889] 18:26:57.826601 recvfrom(29, 0x7f97c76182f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
  [pid 20892] 18:26:57.826630 recvfrom(29, 0x7f97c5cf02f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
  [pid 20891] 18:26:57.826684 recvfrom(29, 0x7f97c66162f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
  [pid 20895] 18:26:57.826716 recvfrom(29, 0x7f97bffda2f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
  [pid 20894] 18:26:57.826747 recvfrom(29, 0x7f97c4cee2f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
  [pid 20888] 18:26:58.419838 recvfrom(29, 0x7ffcc8712c20, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
  [pid 20893] 18:26:58.419900 recvfrom(29, 0x7f97c54ef2f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
  (... hundreds before next sendto() ...)

This situation was handled by clearing HUP and ERR when recv()
returns <0.

A second case was handled, there was a control for a missing dgram
handler, but it does nothing, causing the FD to ring again if this
situation ever happens. After looking at the rest of the code, it
doesn't seem possible to face such a situation because these handlers
are registered during startup, but at least we need to handle it
properly.

A third case was handled, that's mainly a small optimization. With
threads and massive responses, due to the large lock around the loop,
it's likely that some threads will have seen fd_recv_ready() and will
wait at the lock(). But if they wait here, chances are that other
threads will have eliminated pending data and issued fd_cant_recv().
In this case, better re-check fd_recv_ready() before performing the
recv() call to avoid the huge amounts of syscalls that happen on
massively threaded setups.

This patch must be backported as far as 1.6 (the atomic AND just
needs to be turned to a regular AND).
2019-12-10 19:09:15 +01:00
.github/ISSUE_TEMPLATE DOC: Add GitHub issue config.yml 2019-11-03 15:36:06 +01:00
contrib BUG/MINOR: contrib/prometheus-exporter: decode parameter and value only 2019-11-27 11:51:35 +01:00
doc DOC: document the listener state transitions 2019-12-10 16:06:53 +01:00
ebtree BUILD: ebtree: make eb_is_empty() and eb_is_dup() take a const 2019-10-02 15:24:19 +02:00
examples CLEANUP: removed obsolete examples an move a few to better places 2019-06-15 21:25:06 +02:00
include REORG: listener: move the global listener queue code to listener.c 2019-12-10 14:16:03 +01:00
reg-tests MINOR: backend: Add srv_name sample fetche 2019-11-01 05:40:24 +01:00
scripts SCRIPTS: update create-release to fix the changelog on new branches 2019-11-25 20:40:52 +01:00
src BUG/MAJOR: dns: add minimalist error processing on the Rx path 2019-12-10 19:09:15 +01:00
tests TESTS: Add a stress-test for mt_lists. 2019-09-23 18:16:08 +02:00
.cirrus.yml BUILD: CI: comment out cygwin build, upgrade various ssl libraries 2019-10-29 06:27:50 +01:00
.gitignore DOC: create a BRANCHES file to explain the life cycle 2019-06-15 22:00:14 +02:00
.travis.yml BUILD: CI: comment out cygwin build, upgrade various ssl libraries 2019-10-29 06:27:50 +01:00
BRANCHES DOC: create a BRANCHES file to explain the life cycle 2019-06-15 22:00:14 +02:00
CHANGELOG [RELEASE] Released version 2.2-dev0 2019-11-25 20:36:16 +01:00
CONTRIBUTING DOC: improve the wording in CONTRIBUTING about how to document a bug fix 2019-07-26 15:46:21 +02:00
INSTALL DOC: this is development again 2019-11-25 20:37:49 +01:00
LICENSE LICENSE: add licence exception for OpenSSL 2012-09-07 13:52:26 +02:00
MAINTAINERS DOC: wurfl: added point of contact in MAINTAINERS file 2019-04-23 11:00:23 +02:00
Makefile BUILD: reorder the objects in the makefile 2019-11-25 19:47:23 +01:00
README DOC: create a BRANCHES file to explain the life cycle 2019-06-15 22:00:14 +02:00
ROADMAP DOC: update the outdated ROADMAP file 2019-06-15 21:59:54 +02:00
SUBVERS BUILD: use format tags in VERDATE and SUBVERS files 2013-12-10 11:22:49 +01:00
VERDATE [RELEASE] Released version 2.1.0 2019-11-25 19:47:40 +01:00
VERSION [RELEASE] Released version 2.2-dev0 2019-11-25 20:36:16 +01:00

The HAProxy documentation has been split into a number of different files for
ease of use.

Please refer to the following files depending on what you're looking for :

  - INSTALL for instructions on how to build and install HAProxy
  - BRANCHES to understand the project's life cycle and what version to use
  - LICENSE for the project's license
  - CONTRIBUTING for the process to follow to submit contributions

The more detailed documentation is located into the doc/ directory :

  - doc/intro.txt for a quick introduction on HAProxy
  - doc/configuration.txt for the configuration's reference manual
  - doc/lua.txt for the Lua's reference manual
  - doc/SPOE.txt for how to use the SPOE engine
  - doc/network-namespaces.txt for how to use network namespaces under Linux
  - doc/management.txt for the management guide
  - doc/regression-testing.txt for how to use the regression testing suite
  - doc/peers.txt for the peers protocol reference
  - doc/coding-style.txt for how to adopt HAProxy's coding style
  - doc/internals for developer-specific documentation (not all up to date)