haproxy/src
Willy Tarreau 2b57cb8f30 MEDIUM: protocol: implement a "drain" function in protocol layers
Since commit cfd97c6f was merged into 1.5-dev14 (BUG/MEDIUM: checks:
prevent TIME_WAITs from appearing also on timeouts), some valid health
checks sometimes used to show some TCP resets. For example, this HTTP
health check sent to a local server :

  19:55:15.742818 IP 127.0.0.1.16568 > 127.0.0.1.8000: S 3355859679:3355859679(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
  19:55:15.742841 IP 127.0.0.1.8000 > 127.0.0.1.16568: S 1060952566:1060952566(0) ack 3355859680 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
  19:55:15.742863 IP 127.0.0.1.16568 > 127.0.0.1.8000: . ack 1 win 257
  19:55:15.745402 IP 127.0.0.1.16568 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
  19:55:15.745488 IP 127.0.0.1.8000 > 127.0.0.1.16568: FP 1:146(145) ack 23 win 257
  19:55:15.747109 IP 127.0.0.1.16568 > 127.0.0.1.8000: R 23:23(0) ack 147 win 257

After some discussion with Chris Huang-Leaver, it appeared clear that
what we want is to only send the RST when we have no other choice, which
means when the server has not closed. So we still keep SYN/SYN-ACK/RST
for pure TCP checks, but don't want to see an RST emitted as above when
the server has already sent the FIN.

The solution against this consists in implementing a "drain" function at
the protocol layer, which, when defined, causes as much as possible of
the input socket buffer to be flushed to make recv() return zero so that
we know that the server's FIN was received and ACKed. On Linux, we can make
use of MSG_TRUNC on TCP sockets, which has the benefit of draining everything
at once without even copying data. On other platforms, we read up to one
buffer of data before the close. If recv() manages to get the final zero,
we don't disable lingering. Same for hard errors. Otherwise we do.

In practice, on HTTP health checks we generally find that the close was
pending and is returned upon first recv() call. The network trace becomes
cleaner :

  19:55:23.650621 IP 127.0.0.1.16561 > 127.0.0.1.8000: S 3982804816:3982804816(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
  19:55:23.650644 IP 127.0.0.1.8000 > 127.0.0.1.16561: S 4082139313:4082139313(0) ack 3982804817 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
  19:55:23.650666 IP 127.0.0.1.16561 > 127.0.0.1.8000: . ack 1 win 257
  19:55:23.651615 IP 127.0.0.1.16561 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
  19:55:23.651696 IP 127.0.0.1.8000 > 127.0.0.1.16561: FP 1:146(145) ack 23 win 257
  19:55:23.652628 IP 127.0.0.1.16561 > 127.0.0.1.8000: F 23:23(0) ack 147 win 257
  19:55:23.652655 IP 127.0.0.1.8000 > 127.0.0.1.16561: . ack 24 win 257

This change should be backported to 1.4 which is where Chris encountered
this issue. The code is different, so probably the tcp_drain() function
will have to be put in the checks only.
2013-06-10 20:33:23 +02:00
..
acl.c BUG: regex: fix pcre compile error when using JIT 2013-04-11 08:17:37 +02:00
appsession.c MEDIUM: make the trash be a chunk instead of a char * 2012-10-29 16:57:30 +01:00
arg.c MAJOR: sample: maintain a per-proxy list of the fetch args to resolve 2013-04-03 02:13:02 +02:00
auth.c CLEANUP: auth: make the code build again with DEBUG_AUTH 2012-05-10 23:25:35 +02:00
backend.c REORG: tproxy: prepare the transparent proxy defines for accepting other OSes 2013-05-11 08:03:37 +02:00
base64.c [MINOR] add encode/decode function for 30-bit integers from/to base64 2010-10-30 19:04:33 +02:00
buffer.c CLEANUP: buffer: use buffer_empty() instead of buffer_len()==0 2012-12-17 01:14:49 +01:00
cfgparse.c MEDIUM: counters: add support for tracking a third counter 2013-05-29 00:37:16 +02:00
channel.c OPTIM: channel: inline channel_forward's fast path 2012-10-26 01:08:01 +02:00
checks.c MEDIUM: protocol: implement a "drain" function in protocol layers 2013-06-10 20:33:23 +02:00
chunk.c MINOR: chunks: centralize the trash chunk allocation 2012-12-23 21:46:07 +01:00
compression.c BUG/MEDIUM: compression: the deflate algorithm must use global settings as well 2013-04-28 09:01:11 +02:00
connection.c BUG/MEDIUM: connection: always update connection flags prior to computing polling 2012-12-17 01:14:25 +01:00
cttproxy.c CLEANUP: cttproxy: remove a warning on undeclared close() 2012-10-05 22:18:07 +02:00
dumpstats.c BUG/MEDIUM: stats: allocate the stats frontend also on "stats bind-process" 2013-04-20 09:48:50 +02:00
ev_epoll.c BUG/MINOR: epoll: use a fix maxevents argument in epoll_wait() 2013-01-18 15:31:03 +01:00
ev_kqueue.c BUG/MINOR: poll: the I/O handler was called twice for polled I/Os 2012-12-14 00:17:03 +01:00
ev_poll.c MEDIUM: poll: do not use FD_* macros anymore 2013-03-31 15:01:01 +02:00
ev_select.c BUG/MAJOR: ev_select: disable the select() poller if maxsock > FD_SETSIZE 2013-03-31 15:01:05 +02:00
fd.c BUG: polling: don't skip polled events in the spec list 2012-11-12 01:57:14 +01:00
freq_ctr.c BUG/MINOR: time: frequency counters are not totally accurate 2012-12-29 21:50:07 +01:00
frontend.c CLEANUP: acl: remove unused references to ACL_USE_* 2013-04-03 02:13:00 +02:00
haproxy-systemd-wrapper.c BUILD: stdbool is not portable (again) 2013-05-01 10:09:30 +02:00
haproxy.c CLEANUP: fix minor typo in error message. 2013-05-14 20:56:28 +02:00
hdr_idx.c OPTIM/MINOR: move the hdr_idx pools out of the proxy struct 2011-10-24 18:15:04 +02:00
i386-linux-vsys.c MEDIUM: listener: add support for linux's accept4() syscall 2012-10-08 20:11:03 +02:00
lb_chash.c BUG/MAJOR: backend: consistent hash can loop forever in certain circumstances 2013-04-12 14:46:51 +02:00
lb_fas.c CLEANUP: lb_first: add reference to a paper describing the original idea 2012-04-07 09:08:45 +02:00
lb_fwlc.c [MEDIUM] build: switch ebtree users to use new ebtree version 2009-10-26 21:10:04 +01:00
lb_fwrr.c [MEDIUM] build: switch ebtree users to use new ebtree version 2009-10-26 21:10:04 +01:00
lb_map.c [BUG] url_param hash may return a down server 2010-03-12 06:22:16 +01:00
listener.c CLEANUP: acl: remove unused references to ACL_USE_* 2013-04-03 02:13:00 +02:00
log.c MINOR: log: add a new flag 'L' for locally processed requests 2013-06-10 16:42:09 +02:00
memory.c MEDIUM: memory: add the ability to poison memory at run time 2012-05-08 21:28:16 +02:00
payload.c CLEANUP: acl: remove unused references to ACL_USE_* 2013-04-03 02:13:00 +02:00
peers.c MINOR: counters: make it easier to extend the amount of tracked counters 2013-05-28 17:43:40 +02:00
pipe.c BUILD/MINOR: silent a build warning in src/pipe.c (fcntl) 2011-10-24 17:09:22 +02:00
proto_http.c MINOR: http: add full-length header fetch methods 2013-06-10 18:39:42 +02:00
proto_tcp.c MEDIUM: protocol: implement a "drain" function in protocol layers 2013-06-10 20:33:23 +02:00
proto_uxst.c MAJOR: listener: support inheriting a listening fd from the parent 2013-03-11 01:30:01 +01:00
protocol.c REORG: split "protocols" files into protocol and listener 2012-09-15 22:29:32 +02:00
proxy.c MEDIUM: log: report file name, line number, and directive name with log-format errors 2013-04-12 18:36:00 +02:00
queue.c MAJOR: connection: replace struct target with a pointer to an enum 2012-11-12 00:42:33 +01:00
raw_sock.c BUG/MEDIUM: splicing is broken since 1.5-dev12 2013-04-06 11:46:27 +02:00
rbtree.c [MINOR] imported the rbtree function from Linux kernel 2007-01-07 02:12:57 +01:00
regex.c [MINOR] prepare req_*/rsp_* to receive a condition 2010-01-28 18:10:50 +01:00
sample.c MAJOR: sample: maintain a per-proxy list of the fetch args to resolve 2013-04-03 02:13:02 +02:00
server.c MEDIUM: server: Tighten up parsing of weight string 2013-02-13 10:59:50 +01:00
session.c MEDIUM: protocol: implement a "drain" function in protocol layers 2013-06-10 20:33:23 +02:00
sessionhash.c [PATCH] appsessions: cleanup DEBUG_HASH and initialize request_counter 2008-08-13 23:43:26 +02:00
shctx.c BUG/MEDIUM: shctx: makes the code independent on SSL runtime version. 2013-04-26 19:15:52 +02:00
signal.c BUG/MEDIUM: signal: signal handler does not properly check for signal bounds 2013-01-24 16:19:19 +01:00
ssl_sock.c MEDIUM: protocol: implement a "drain" function in protocol layers 2013-06-10 20:33:23 +02:00
standard.c MEDIUM: stats: add proxy name filtering on the statistic page 2013-04-15 22:50:33 +02:00
stick_table.c MEDIUM: counters: add a new "gpc0_rate" counter in stick-tables 2013-05-29 15:54:14 +02:00
stream_interface.c BUG/MEDIUM: stream_interface: don't close outgoing connections on shutw() 2012-12-30 01:39:37 +01:00
task.c [OPTIM] task: don't scan the run queue if we know it's empty 2011-09-10 20:08:49 +02:00
time.c BUG/MINOR: time: frequency counters are not totally accurate 2012-12-29 21:50:07 +01:00
trace.c MINOR: add a new function call tracer for debugging purposes 2012-05-26 00:12:37 +02:00
uri_auth.c BUG/MEDIUM: uri_auth: missing NULL check and memory leak on memory shortage 2013-01-24 16:19:19 +01:00