2006-06-25 20:48:02 -04:00
/*
2021-05-09 00:14:25 -04:00
* HAProxy : High Availability - enabled HTTP / TCP proxy
2024-01-06 08:09:35 -05:00
* Copyright 2000 - 2024 Willy Tarreau < willy @ haproxy . org > .
2006-06-25 20:48:02 -04:00
*
* This program is free software ; you can redistribute it and / or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation ; either version
* 2 of the License , or ( at your option ) any later version .
*/
2015-12-08 16:43:09 -05:00
# define _GNU_SOURCE
2006-06-25 20:48:02 -04:00
# include <stdio.h>
# include <stdlib.h>
# include <unistd.h>
# include <string.h>
# include <ctype.h>
2016-05-13 17:52:56 -04:00
# include <dirent.h>
# include <sys/stat.h>
2006-06-25 20:48:02 -04:00
# include <sys/time.h>
# include <sys/types.h>
# include <sys/socket.h>
# include <netinet/tcp.h>
# include <netinet/in.h>
# include <arpa/inet.h>
# include <netdb.h>
# include <fcntl.h>
# include <errno.h>
# include <signal.h>
# include <stdarg.h>
# include <sys/resource.h>
2020-04-18 10:02:47 -04:00
# include <sys/utsname.h>
2013-02-12 04:53:52 -05:00
# include <sys/wait.h>
2006-06-25 20:48:02 -04:00
# include <time.h>
# include <syslog.h>
BUG/MEDIUM: remove supplementary groups when changing gid
Without it, haproxy will retain the group membership of root, which may
give more access than intended to the process. For example, haproxy would
still be in the wheel group on Fedora 18, as seen with :
# haproxy -f /etc/haproxy/haproxy.cfg
# ps a -o pid,user,group,command | grep hapr
3545 haproxy haproxy haproxy -f /etc/haproxy/haproxy.cfg
4356 root root grep --color=auto hapr
# grep Group /proc/3545/status
Groups: 0 1 2 3 4 6 10
# getent group wheel
wheel:x:10:root,misc
[WT: The issue has been investigated by independent security research team
and realized by itself not being able to allow security exploitation.
Additionally, dropping groups is not allowed to unprivileged users,
though this mode of deployment is quite common. Thus a warning is
emitted in this case to inform the user. The fix could be backported
into all supported versions as the issue has always been there. ]
2013-01-12 12:35:19 -05:00
# include <grp.h>
2021-10-06 16:22:40 -04:00
2021-10-06 16:53:51 -04:00
# ifdef USE_THREAD
# include <pthread.h>
# endif
2012-11-16 10:12:27 -05:00
# ifdef USE_CPU_AFFINITY
# include <sched.h>
2018-11-12 11:22:19 -05:00
# if defined(__FreeBSD__) || defined(__DragonFly__)
2015-09-17 15:26:40 -04:00
# include <sys/param.h>
2018-11-12 11:22:19 -05:00
# ifdef __FreeBSD__
2015-09-17 15:26:40 -04:00
# include <sys/cpuset.h>
2018-11-12 11:22:19 -05:00
# endif
2019-09-13 00:12:58 -04:00
# endif
2012-11-16 10:12:27 -05:00
# endif
2006-06-25 20:48:02 -04:00
2019-04-15 13:38:50 -04:00
# if defined(USE_PRCTL)
# include <sys/prctl.h>
# endif
2021-08-21 04:13:10 -04:00
# if defined(USE_PROCCTL)
# include <sys/procctl.h>
# endif
2006-06-25 20:48:02 -04:00
# ifdef DEBUG_FULL
# include <assert.h>
# endif
2017-11-20 09:58:35 -05:00
# if defined(USE_SYSTEMD)
2024-04-03 11:32:20 -04:00
# include <haproxy/systemd.h>
2017-11-20 09:58:35 -05:00
# endif
2006-06-25 20:48:02 -04:00
BUG/MEDIUM: random: initialize the random pool a bit better
Since the UUID sample fetch was created, some people noticed that in
certain virtualized environments they manage to get exact same UUIDs
on different instances started exactly at the same moment. It turns
out that the randoms were only initialized to spread the health checks
originally, not to provide "clean" randoms.
This patch changes this and collects more randomness from various
sources, including existing randoms, /dev/urandom when available,
RAND_bytes() when OpenSSL is available, as well as the timing for such
operations, then applies a SHA1 on all this to keep a 160 bits random
seed available, 32 of which are passed to srandom().
It's worth mentioning that there's no clean way to pass more than 32
bits to srandom() as even initstate() provides an opaque state that
must absolutely not be tampered with since known implementations
contain state information.
At least this allows to have up to 4 billion different sequences
from the boot, which is not that bad.
Note that the thread safety was still not addressed, which is another
issue for another patch.
This must be backported to all versions containing the UUID sample
fetch function, i.e. as far as 2.0.
2020-03-06 12:57:15 -05:00
# include <import/sha1.h>
2020-06-09 03:07:15 -04:00
# include <haproxy/acl.h>
2021-03-25 12:15:52 -04:00
# include <haproxy/action.h>
2020-06-09 03:07:15 -04:00
# include <haproxy/activity.h>
# include <haproxy/api.h>
# include <haproxy/arg.h>
# include <haproxy/auth.h>
2020-05-27 10:10:29 -04:00
# include <haproxy/base64.h>
2020-06-09 03:07:15 -04:00
# include <haproxy/capture-t.h>
2021-07-16 09:39:28 -04:00
# include <haproxy/cfgcond.h>
2021-03-30 11:34:24 -04:00
# include <haproxy/cfgdiag.h>
2020-06-04 18:00:29 -04:00
# include <haproxy/cfgparse.h>
2020-06-02 04:22:45 -04:00
# include <haproxy/chunk.h>
2020-06-04 14:19:54 -04:00
# include <haproxy/cli.h>
2021-10-08 03:33:24 -04:00
# include <haproxy/clock.h>
2020-06-04 12:02:10 -04:00
# include <haproxy/connection.h>
2021-04-23 10:58:08 -04:00
# ifdef USE_CPU_AFFINITY
2021-04-21 12:39:58 -04:00
# include <haproxy/cpuset.h>
2021-04-23 10:58:08 -04:00
# endif
2023-11-23 05:32:24 -05:00
# include <haproxy/debug.h>
2020-06-04 04:53:16 -04:00
# include <haproxy/dns.h>
2020-06-02 05:28:02 -04:00
# include <haproxy/dynbuf.h>
2020-05-27 10:10:29 -04:00
# include <haproxy/errors.h>
2020-06-09 03:07:15 -04:00
# include <haproxy/fd.h>
2020-06-04 15:29:29 -04:00
# include <haproxy/filters.h>
2020-06-09 03:07:15 -04:00
# include <haproxy/global.h>
2020-06-04 03:20:54 -04:00
# include <haproxy/hlua.h>
2020-06-04 05:40:28 -04:00
# include <haproxy/http_rules.h>
2024-07-10 06:15:45 -04:00
# include <haproxy/limits.h>
2023-08-29 04:24:26 -04:00
# if defined(USE_LINUX_CAP)
# include <haproxy/linuxcap.h>
# endif
2020-05-27 12:01:47 -04:00
# include <haproxy/list.h>
2020-06-04 08:58:24 -04:00
# include <haproxy/listener.h>
2020-06-04 16:01:04 -04:00
# include <haproxy/log.h>
2020-06-04 08:07:37 -04:00
# include <haproxy/mworker.h>
2020-06-02 11:02:59 -04:00
# include <haproxy/namespace.h>
2020-06-02 10:48:09 -04:00
# include <haproxy/net_helper.h>
2020-05-27 10:26:00 -04:00
# include <haproxy/openssl-compat.h>
2023-03-08 04:37:45 -05:00
# include <haproxy/quic_conn.h>
2022-05-23 10:38:14 -04:00
# include <haproxy/quic_tp-t.h>
2020-06-04 09:06:28 -04:00
# include <haproxy/pattern.h>
2020-06-04 12:38:21 -04:00
# include <haproxy/peers.h>
2020-06-09 03:07:15 -04:00
# include <haproxy/pool.h>
# include <haproxy/protocol.h>
2020-08-26 04:23:40 -04:00
# include <haproxy/proto_tcp.h>
2020-06-04 16:29:18 -04:00
# include <haproxy/proxy.h>
2020-06-02 11:32:26 -04:00
# include <haproxy/regex.h>
2020-06-09 03:07:15 -04:00
# include <haproxy/sample.h>
2020-06-04 17:20:13 -04:00
# include <haproxy/server.h>
2020-06-04 12:58:52 -04:00
# include <haproxy/session.h>
2020-06-04 11:37:26 -04:00
# include <haproxy/signal.h>
2020-08-28 10:29:53 -04:00
# include <haproxy/sock.h>
2020-08-28 09:40:33 -04:00
# include <haproxy/sock_inet.h>
2020-06-04 14:30:20 -04:00
# include <haproxy/ssl_sock.h>
2024-04-24 05:09:06 -04:00
# include <haproxy/stats-file.h>
2020-10-05 05:49:42 -04:00
# include <haproxy/stats-t.h>
2020-06-04 17:46:14 -04:00
# include <haproxy/stream.h>
2020-06-04 11:25:40 -04:00
# include <haproxy/task.h>
2020-05-28 09:29:19 -04:00
# include <haproxy/thread.h>
2020-06-09 03:07:15 -04:00
# include <haproxy/time.h>
# include <haproxy/tools.h>
2023-11-22 08:58:59 -05:00
# include <haproxy/trace.h>
2020-06-09 03:07:15 -04:00
# include <haproxy/uri_auth-t.h>
2020-06-04 10:25:31 -04:00
# include <haproxy/vars.h>
2020-06-09 03:07:15 -04:00
# include <haproxy/version.h>
2006-06-25 20:48:02 -04:00
2019-03-29 16:30:17 -04:00
/* array of init calls for older platforms */
DECLARE_INIT_STAGES ;
2021-04-10 10:53:05 -04:00
/* create a read_mostly section to hold variables which are accessed a lot
* but which almost never change . The purpose is to isolate them in their
* own cache lines where they don ' t risk to be perturbated by write accesses
* to neighbor variables . We need to create an empty aligned variable for
* this . The fact that the variable is of size zero means that it will be
* eliminated at link time if no other variable uses it , but alignment will
* be respected .
*/
empty_t __read_mostly_align HA_SECTION ( " read_mostly " ) ALIGNED ( 64 ) ;
2021-05-06 10:30:32 -04:00
# ifdef BUILD_FEATURES
2023-10-06 04:45:16 -04:00
char * build_features = BUILD_FEATURES ;
2021-05-06 10:30:32 -04:00
# else
2023-10-06 04:45:16 -04:00
char * build_features = " " ;
2021-05-06 10:30:32 -04:00
# endif
2010-01-03 15:12:30 -05:00
/* list of config files */
static struct list cfg_cfgfiles = LIST_HEAD_INIT ( cfg_cfgfiles ) ;
2006-06-25 20:48:02 -04:00
int pid ; /* current process id */
2022-06-28 13:29:29 -04:00
static unsigned long stopping_tgroup_mask ; /* Thread groups acknowledging stopping */
2020-03-12 12:24:53 -04:00
2006-06-25 20:48:02 -04:00
/* global options */
struct global global = {
2017-03-23 17:44:13 -04:00
. hard_stop_after = TICK_ETERNITY ,
MEDIUM: global: Add a "close-spread-time" option to spread soft-stop on time window
The new 'close-spread-time' global option can be used to spread idle and
active HTTP connction closing after a SIGUSR1 signal is received. This
allows to limit bursts of reconnections when too many idle connections
are closed at once. Indeed, without this new mechanism, in case of
soft-stop, all the idle connections would be closed at once (after the
grace period is over), and all active HTTP connections would be closed
by appending a "Connection: close" header to the next response that goes
over it (or via a GOAWAY frame in case of HTTP2).
This patch adds the support of this new option for HTTP as well as HTTP2
connections. It works differently on active and idle connections.
On active connections, instead of sending systematically the GOAWAY
frame or adding the 'Connection: close' header like before once the
soft-stop has started, a random based on the remainder of the close
window is calculated, and depending on its result we could decide to
keep the connection alive. The random will be recalculated for any
subsequent request/response on this connection so the GOAWAY will still
end up being sent, but we might wait a few more round trips. This will
ensure that goaways are distributed along a longer time window than
before.
On idle connections, a random factor is used when determining the expire
field of the connection's task, which should naturally spread connection
closings on the time window (see h2c_update_timeout).
This feature request was described in GitHub issue #1614.
This patch should be backported to 2.5. It depends on "BUG/MEDIUM:
mux-h2: make use of http-request and keep-alive timeouts" which
refactorized the timeout management of HTTP2 connections.
2022-04-08 12:04:18 -04:00
. close_spread_time = TICK_ETERNITY ,
. close_spread_end = TICK_ETERNITY ,
2021-03-26 13:50:33 -04:00
. numa_cpu_mapping = 1 ,
2019-01-26 08:27:06 -05:00
. nbthread = 0 ,
2012-04-05 12:02:55 -04:00
. req_count = 0 ,
MEDIUM: tree-wide: logsrv struct becomes logger
When 'log' directive was implemented, the internal representation was
named 'struct logsrv', because the 'log' directive would directly point
to the log target, which used to be a (UDP) log server exclusively at
that time, hence the name.
But things have become more complex, since today 'log' directive can point
to ring targets (implicit, or named) for example.
Indeed, a 'log' directive does no longer reference the "final" server to
which the log will be sent, but instead it describes which log API and
parameters to use for transporting the log messages to the proper log
destination.
So now the term 'logsrv' is rather confusing and prevents us from
introducing a new level of abstraction because they would be mixed
with logsrv.
So in order to better designate this 'log' directive, and make it more
generic, we chose the word 'logger' which now replaces logsrv everywhere
it was used in the code (including related comments).
This is internal rewording, so no functional change should be expected
on user-side.
2023-09-11 09:06:53 -04:00
. loggers = LIST_HEAD_INIT ( global . loggers ) ,
2022-04-25 13:29:10 -04:00
. maxzlibmem = DEFAULT_MAXZLIBMEM * 1024U * 1024U ,
2012-11-09 11:05:39 -05:00
. comp_rate_lim = 0 ,
2014-01-29 06:24:34 -05:00
. ssl_server_verify = SSL_SERVER_VERIFY_REQUIRED ,
2010-10-22 11:59:25 -04:00
. unix_bind = {
. ux = {
. uid = - 1 ,
. gid = - 1 ,
. mode = 0 ,
}
} ,
2009-08-17 01:23:33 -04:00
. tune = {
2023-04-20 09:40:38 -04:00
. options = GTUNE_LISTENER_MQ_OPT ,
2018-12-12 00:19:42 -05:00
. bufsize = ( BUFSIZE + 2 * sizeof ( void * ) - 1 ) & - ( 2 * sizeof ( void * ) ) ,
2020-01-22 08:31:21 -05:00
. maxrewrite = MAXREWRITE ,
MAJOR: session: only wake up as many sessions as available buffers permit
We've already experimented with three wake up algorithms when releasing
buffers : the first naive one used to wake up far too many sessions,
causing many of them not to get any buffer. The second approach which
was still in use prior to this patch consisted in waking up either 1
or 2 sessions depending on the number of FDs we had released. And this
was still inaccurate. The third one tried to cover the accuracy issues
of the second and took into consideration the number of FDs the sessions
would be willing to use, but most of the time we ended up waking up too
many of them for nothing, or deadlocking by lack of buffers.
This patch completely removes the need to allocate two buffers at once.
Instead it splits allocations into critical and non-critical ones and
implements a reserve in the pool for this. The deadlock situation happens
when all buffers are be allocated for requests pending in a maxconn-limited
server queue, because then there's no more way to allocate buffers for
responses, and these responses are critical to release the servers's
connection in order to release the pending requests. In fact maxconn on
a server creates a dependence between sessions and particularly between
oldest session's responses and latest session's requests. Thus, it is
mandatory to get a free buffer for a response in order to release a
server connection which will permit to release a request buffer.
Since we definitely have non-symmetrical buffers, we need to implement
this logic in the buffer allocation mechanism. What this commit does is
implement a reserve of buffers which can only be allocated for responses
and that will never be allocated for requests. This is made possible by
the requester indicating how much margin it wants to leave after the
allocation succeeds. Thus it is a cooperative allocation mechanism : the
requester (process_session() in general) prefers not to get a buffer in
order to respect other's need for response buffers. The session management
code always knows if a buffer will be used for requests or responses, so
that is not difficult :
- either there's an applet on the initiator side and we really need
the request buffer (since currently the applet is called in the
context of the session)
- or we have a connection and we really need the response buffer (in
order to support building and sending an error message back)
This reserve ensures that we don't take all allocatable buffers for
requests waiting in a queue. The downside is that all the extra buffers
are really allocated to ensure they can be allocated. But with small
values it is not an issue.
With this change, we don't observe any more deadlocks even when running
with maxconn 1 on a server under severely constrained memory conditions.
The code becomes a bit tricky, it relies on the scheduler's run queue to
estimate how many sessions are already expected to run so that it doesn't
wake up everyone with too few resources. A better solution would probably
consist in having two queues, one for urgent requests and one for normal
requests. A failed allocation for a session dealing with an error, a
connection event, or the need for a response (or request when there's an
applet on the left) would go to the urgent request queue, while other
requests would go to the other queue. Urgent requests would be served
from 1 entry in the pool, while the regular ones would be served only
according to the reserve. Despite not yet having this, it works
remarkably well.
This mechanism is quite efficient, we don't perform too many wake up calls
anymore. For 1 million sessions elapsed during massive memory contention,
we observe about 4.5M calls to process_session() compared to 4.0M without
memory constraints. Previously we used to observe up to 16M calls, which
rougly means 12M failures.
During a test run under high memory constraints (limit enforced to 27 MB
instead of the 58 MB normally needed), performance used to drop by 53% prior
to this patch. Now with this patch instead it *increases* by about 1.5%.
The best effect of this change is that by limiting the memory usage to about
2/3 to 3/4 of what is needed by default, it's possible to increase performance
by up to about 18% mainly due to the fact that pools are reused more often
and remain hot in the CPU cache (observed on regular HTTP traffic with 20k
objects, buffers.limit = maxconn/10, buffers.reserve = limit/2).
Below is an example of scenario which used to cause a deadlock previously :
- connection is received
- two buffers are allocated in process_session() then released
- one is allocated when receiving an HTTP request
- the second buffer is allocated then released in process_session()
for request parsing then connection establishment.
- poll() says we can send, so the request buffer is sent and released
- process session gets notified that the connection is now established
and allocates two buffers then releases them
- all other sessions do the same till one cannot get the request buffer
without hitting the margin
- and now the server responds. stream_interface allocates the response
buffer and manages to get it since it's higher priority being for a
response.
- but process_session() cannot allocate the request buffer anymore
=> We could end up with all buffers used by responses so that none may
be allocated for a request in process_session().
When the applet processing leaves the session context, the test will have
to be changed so that we always allocate a response buffer regardless of
the left side (eg: H2->H1 gateway). A final improvement would consists in
being able to only retry the failed I/O operation without waking up a
task, but to date all experiments to achieve this have proven not to be
reliable enough.
2014-11-26 19:11:56 -05:00
. reserved_bufs = RESERVED_BUFS ,
2015-04-29 10:24:50 -04:00
. pattern_cache = DEFAULT_PAT_LRU_SIZE ,
MEDIUM: connections: Add a way to control the number of idling connections.
As by default we add all keepalive connections to the idle pool, if we run
into a pathological case, where all client don't do keepalive, but the server
does, and haproxy is configured to only reuse "safe" connections, we will
soon find ourself having lots of idling, unusable for new sessions, connections,
while we won't have any file descriptors available to create new connections.
To fix this, add 2 new global settings, "pool_low_ratio" and "pool_high_ratio".
pool-low-fd-ratio is the % of fds we're allowed to use (against the maximum
number of fds available to haproxy) before we stop adding connections to the
idle pool, and destroy them instead. The default is 20. pool-high-fd-ratio is
the % of fds we're allowed to use (against the maximum number of fds available
to haproxy) before we start killing idling connection in the event we have to
create a new outgoing connection, and no reuse is possible. The default is 25.
2019-04-16 13:07:22 -04:00
. pool_low_ratio = 20 ,
. pool_high_ratio = 25 ,
2019-07-19 03:36:45 -04:00
. max_http_hdr = MAX_HTTP_HDR ,
2012-09-03 06:10:29 -04:00
# ifdef USE_OPENSSL
2012-11-14 05:32:56 -05:00
. sslcachesize = SSLCACHESIZE ,
2012-11-07 10:54:34 -05:00
# endif
2012-11-09 06:33:10 -05:00
. comp_maxlevel = 1 ,
2014-02-12 10:35:14 -05:00
# ifdef DEFAULT_IDLE_TIMER
. idle_timer = DEFAULT_IDLE_TIMER ,
# else
. idle_timer = 1000 , /* 1 second */
# endif
2023-01-06 10:09:58 -05:00
. nb_stk_ctr = MAX_SESS_STKCTR ,
2023-04-22 18:51:59 -04:00
. default_shards = - 2 , /* by-group */
2022-04-19 12:26:55 -04:00
# ifdef USE_QUIC
2022-05-23 12:29:39 -04:00
. quic_backend_max_idle_timeout = QUIC_TP_DFLT_BACK_MAX_IDLE_TIMEOUT ,
. quic_frontend_max_idle_timeout = QUIC_TP_DFLT_FRONT_MAX_IDLE_TIMEOUT ,
. quic_frontend_max_streams_bidi = QUIC_TP_DFLT_FRONT_MAX_STREAMS_BIDI ,
2024-02-13 13:38:46 -05:00
. quic_reorder_ratio = QUIC_DFLT_REORDER_RATIO ,
2022-05-20 10:29:10 -04:00
. quic_retry_threshold = QUIC_DFLT_RETRY_THRESHOLD ,
2023-01-31 05:44:50 -05:00
. quic_max_frame_loss = QUIC_DFLT_MAX_FRAME_LOSS ,
2022-04-19 12:26:55 -04:00
. quic_streams_buf = 30 ,
# endif /* USE_QUIC */
2009-08-17 01:23:33 -04:00
} ,
2012-10-05 09:47:31 -04:00
# ifdef USE_OPENSSL
# ifdef DEFAULT_MAXSSLCONN
2012-09-06 05:58:37 -04:00
. maxsslconn = DEFAULT_MAXSSLCONN ,
2012-10-05 09:47:31 -04:00
# endif
2012-09-06 05:58:37 -04:00
# endif
2024-05-23 10:07:16 -04:00
/* by default allow clients which use a privileged port for TCP only */
. clt_privileged_ports = HA_PROTO_TCP ,
2006-06-25 20:48:02 -04:00
/* others NULL OK */
} ;
/*********************************************************************/
int stopping ; /* non zero means stopping in progress */
2017-03-23 17:44:13 -04:00
int killed ; /* non zero means a hard-stop is triggered */
2010-08-31 09:39:26 -04:00
int jobs = 0 ; /* number of active jobs (conns, listeners, active tasks, ...) */
2018-11-16 10:57:20 -05:00
int unstoppable_jobs = 0 ; /* number of active jobs that can't be stopped during a soft stop */
2018-11-05 10:31:22 -05:00
int active_peers = 0 ; /* number of active peers (connection attempts and connected) */
2018-11-05 11:12:27 -05:00
int connected_peers = 0 ; /* number of connected peers (verified ones) */
2022-02-17 12:10:36 -05:00
int arg_mode = 0 ; /* MODE_DEBUG etc as passed on command line ... */
char * change_dir = NULL ; /* set when -C is passed */
char * check_condition = NULL ; /* check condition passed to -cc */
2006-06-25 20:48:02 -04:00
2020-07-05 07:36:08 -04:00
/* Here we store information about the pids of the processes we may pause
2006-06-25 20:48:02 -04:00
* or kill . We will send them a signal every 10 ms until we can bind to all
* our ports . With 200 retries , that ' s about 2 seconds .
*/
# define MAX_START_RETRIES 200
static int * oldpids = NULL ;
static int oldpids_sig ; /* use USR1 or TERM */
2017-04-05 16:33:04 -04:00
/* Path to the unix socket we use to retrieve listener sockets from the old process */
static const char * old_unixsocket ;
2017-06-01 11:38:52 -04:00
int atexit_flag = 0 ;
2010-08-25 06:58:59 -04:00
int nb_oldpids = 0 ;
2006-06-25 20:48:02 -04:00
const int zero = 0 ;
const int one = 1 ;
2007-10-11 14:48:58 -04:00
const struct linger nolinger = { . l_onoff = 1 , . l_linger = 0 } ;
2006-06-25 20:48:02 -04:00
2010-03-12 15:58:54 -05:00
char hostname [ MAX_HOSTNAME_LEN ] ;
2020-06-18 10:56:47 -04:00
char * localpeer = NULL ;
MINOR: management: add some basic keyword dump infrastructure
It's difficult from outside haproxy to detect the supported keywords
and syntax. Interestingly, many of our modern keywords are enumerated
since they're registered from constructors, so it's not very hard to
enumerate most of them.
This patch creates some basic infrastructure to support dumping existing
keywords from different classes on stdout. The format will differ depending
on the classes, but the idea is that the output could easily be passed to
a script that generates some simple syntax highlighting rules, completion
rules for editors, syntax checkers or config parsers.
The principle chosen here is that if "-dK" is passed on the command-line,
at the end of the parsing the registered keywords will be dumped for the
requested classes passed after "-dK". Special name "help" will show known
classes, while "all" will execute all of them. The reason for doing that
after the end of the config processor is that it will also enumerate
internally-generated keywords, Lua or even those loaded from external
code (e.g. if an add-on is loaded using LD_PRELOAD). A typical way to
call this with a valid config would be:
./haproxy -dKall -q -c -f /path/to/config
If there's no config available, feeding /dev/null will also do the job,
though it will not be able to detect dynamically created keywords, of
course.
This patch also updates the management doc.
For now nothing but the help is listed, various subsystems will follow
in subsequent patches.
2022-03-08 10:01:40 -05:00
static char * kwd_dump = NULL ; // list of keyword dumps to produce
2006-06-25 20:48:02 -04:00
2020-06-05 08:08:41 -04:00
static char * * old_argv = NULL ; /* previous argv but cleaned up */
2017-06-01 11:38:51 -04:00
2018-09-11 04:06:26 -04:00
struct list proc_list = LIST_HEAD_INIT ( proc_list ) ;
int master = 0 ; /* 1 if in master, 0 if in child */
BUG/MEDIUM: random: initialize the random pool a bit better
Since the UUID sample fetch was created, some people noticed that in
certain virtualized environments they manage to get exact same UUIDs
on different instances started exactly at the same moment. It turns
out that the randoms were only initialized to spread the health checks
originally, not to provide "clean" randoms.
This patch changes this and collects more randomness from various
sources, including existing randoms, /dev/urandom when available,
RAND_bytes() when OpenSSL is available, as well as the timing for such
operations, then applies a SHA1 on all this to keep a 160 bits random
seed available, 32 of which are passed to srandom().
It's worth mentioning that there's no clean way to pass more than 32
bits to srandom() as even initstate() provides an opaque state that
must absolutely not be tampered with since known implementations
contain state information.
At least this allows to have up to 4 billion different sequences
from the boot, which is not that bad.
Note that the thread safety was still not addressed, which is another
issue for another patch.
This must be backported to all versions containing the UUID sample
fetch function, i.e. as far as 2.0.
2020-03-06 12:57:15 -05:00
/* per-boot randomness */
unsigned char boot_seed [ 20 ] ; /* per-boot random seed (160 bits initially) */
2021-09-28 03:43:11 -04:00
/* takes the thread config in argument or NULL for any thread */
2018-09-11 04:06:18 -04:00
static void * run_thread_poll_loop ( void * data ) ;
2014-04-28 16:27:06 -04:00
/* bitfield of a few warnings to emit just once (WARN_*) */
unsigned int warned = 0 ;
2021-05-05 10:18:45 -04:00
/* set if experimental features have been used for the current process */
2022-02-25 04:10:00 -05:00
unsigned int tainted = 0 ;
2021-05-05 10:18:45 -04:00
2021-05-06 10:21:39 -04:00
unsigned int experimental_directives_allowed = 0 ;
2024-03-15 04:01:11 -04:00
unsigned int deprecated_directives_allowed = 0 ;
2021-05-06 10:21:39 -04:00
int check_kw_experimental ( struct cfg_keyword * kw , const char * file , int linenum ,
char * * errmsg )
{
if ( kw - > flags & KWF_EXPERIMENTAL ) {
if ( ! experimental_directives_allowed ) {
2021-05-07 09:07:21 -04:00
memprintf ( errmsg , " parsing [%s:%d] : '%s' directive is experimental, must be allowed via a global 'expose-experimental-directives' " ,
2021-05-06 10:21:39 -04:00
file , linenum , kw - > kw ) ;
return 1 ;
}
mark_tainted ( TAINTED_CONFIG_EXP_KW_DECLARED ) ;
}
return 0 ;
}
2016-12-21 12:43:10 -05:00
/* These are strings to be reported in the output of "haproxy -vv". They may
* either be constants ( in which case must_free must be zero ) or dynamically
* allocated strings to pass to free ( ) on exit , and in this case must_free
* must be non - zero .
*/
struct list build_opts_list = LIST_HEAD_INIT ( build_opts_list ) ;
struct build_opts_str {
struct list list ;
const char * str ;
int must_free ;
} ;
2006-06-25 20:48:02 -04:00
/*********************************************************************/
/* general purpose functions ***************************************/
/*********************************************************************/
2016-12-21 12:43:10 -05:00
/* used to register some build option strings at boot. Set must_free to
* non - zero if the string must be freed upon exit .
*/
void hap_register_build_opts ( const char * str , int must_free )
{
struct build_opts_str * b ;
b = calloc ( 1 , sizeof ( * b ) ) ;
if ( ! b ) {
fprintf ( stderr , " out of memory \n " ) ;
exit ( 1 ) ;
}
b - > str = str ;
b - > must_free = must_free ;
2021-04-21 01:32:39 -04:00
LIST_APPEND ( & build_opts_list , & b - > list ) ;
2016-12-21 12:43:10 -05:00
}
2024-01-02 04:56:05 -05:00
/* returns the first build option when <curr> is NULL, or the next one when
* < curr > is passed the last returned value . NULL when there is no more entries
* in the list . Otherwise the returned pointer is & opt - > str so the caller can
* print it as * ret .
*/
const char * * hap_get_next_build_opt ( const char * * curr )
{
struct build_opts_str * head , * start ;
head = container_of ( & build_opts_list , struct build_opts_str , list ) ;
if ( curr )
start = container_of ( curr , struct build_opts_str , str ) ;
else
start = head ;
start = container_of ( start - > list . n , struct build_opts_str , list ) ;
if ( start = = head )
return NULL ;
return & start - > str ;
}
2023-10-06 04:45:16 -04:00
/* used to make a new feature appear in the build_features list at boot time.
* The feature must be in the format " XXX " without the leading " + " which will
* be automatically appended .
*/
void hap_register_feature ( const char * name )
{
static int must_free = 0 ;
int new_len = strlen ( build_features ) + 2 + strlen ( name ) ;
char * new_features ;
new_features = malloc ( new_len + 1 ) ;
if ( ! new_features )
return ;
strlcpy2 ( new_features , build_features , new_len ) ;
snprintf ( new_features , new_len + 1 , " %s +%s " , build_features , name ) ;
if ( must_free )
ha_free ( & build_features ) ;
build_features = new_features ;
must_free = 1 ;
}
2021-05-06 01:43:35 -04:00
# define VERSION_MAX_ELTS 7
/* This function splits an haproxy version string into an array of integers.
* The syntax of the supported version string is the following :
*
* < a > [ . < b > [ . < c > [ . < d > ] ] ] [ - { dev , pre , rc } < f > ] [ - * ] [ - < g > ]
*
* This validates for example :
* 1.2 .1 - pre2 , 1.2 .1 , 1.2 .10 .1 , 1.3 .16 - rc1 , 1.4 - dev3 , 1.5 - dev18 , 1.5 - dev18 - 43
* 2.4 - dev18 - f6818d - 20
*
* The result is set in a array of < VERSION_MAX_ELTS > elements . Each letter has
* one fixed place in the array . The tags take a numeric value called < e > which
* defaults to 3. " dev " is 1 , " rc " and " pre " are 2. Numbers not encountered are
* considered as zero ( henxe 1.5 and 1.5 .0 are the same ) .
*
* The resulting values are :
* 1.2 .1 - pre2 1 , 2 , 1 , 0 , 2 , 2 , 0
* 1.2 .1 1 , 2 , 1 , 0 , 3 , 0 , 0
* 1.2 .10 .1 1 , 2 , 10 , 1 , 3 , 0 , 0
* 1.3 .16 - rc1 1 , 3 , 16 , 0 , 2 , 1 , 0
* 1.4 - dev3 1 , 4 , 0 , 0 , 1 , 3 , 0
* 1.5 - dev18 1 , 5 , 0 , 0 , 1 , 18 , 0
* 1.5 - dev18 - 43 1 , 5 , 0 , 0 , 1 , 18 , 43
* 2.4 - dev18 - f6818d - 20 2 , 4 , 0 , 0 , 1 , 18 , 20
*
* The function returns non - zero if the conversion succeeded , or zero if it
* failed .
*/
int split_version ( const char * version , unsigned int * value )
{
const char * p , * s ;
char * error ;
int nelts ;
/* Initialize array with zeroes */
for ( nelts = 0 ; nelts < VERSION_MAX_ELTS ; nelts + + )
value [ nelts ] = 0 ;
value [ 4 ] = 3 ;
p = version ;
/* If the version number is empty, return false */
if ( * p = = ' \0 ' )
return 0 ;
/* Convert first number <a> */
value [ 0 ] = strtol ( p , & error , 10 ) ;
p = error + 1 ;
if ( * error = = ' \0 ' )
return 1 ;
if ( * error = = ' - ' )
goto split_version_tag ;
if ( * error ! = ' . ' )
return 0 ;
/* Convert first number <b> */
value [ 1 ] = strtol ( p , & error , 10 ) ;
p = error + 1 ;
if ( * error = = ' \0 ' )
return 1 ;
if ( * error = = ' - ' )
goto split_version_tag ;
if ( * error ! = ' . ' )
return 0 ;
/* Convert first number <c> */
value [ 2 ] = strtol ( p , & error , 10 ) ;
p = error + 1 ;
if ( * error = = ' \0 ' )
return 1 ;
if ( * error = = ' - ' )
goto split_version_tag ;
if ( * error ! = ' . ' )
return 0 ;
/* Convert first number <d> */
value [ 3 ] = strtol ( p , & error , 10 ) ;
p = error + 1 ;
if ( * error = = ' \0 ' )
return 1 ;
if ( * error ! = ' - ' )
return 0 ;
split_version_tag :
/* Check for commit number */
if ( * p > = ' 0 ' & & * p < = ' 9 ' )
goto split_version_commit ;
/* Read tag */
if ( strncmp ( p , " dev " , 3 ) = = 0 ) { value [ 4 ] = 1 ; p + = 3 ; }
else if ( strncmp ( p , " rc " , 2 ) = = 0 ) { value [ 4 ] = 2 ; p + = 2 ; }
else if ( strncmp ( p , " pre " , 3 ) = = 0 ) { value [ 4 ] = 2 ; p + = 3 ; }
else
goto split_version_commit ;
/* Convert tag number */
value [ 5 ] = strtol ( p , & error , 10 ) ;
p = error + 1 ;
if ( * error = = ' \0 ' )
return 1 ;
if ( * error ! = ' - ' )
return 0 ;
split_version_commit :
/* Search the last "-" */
s = strrchr ( p , ' - ' ) ;
if ( s ) {
s + + ;
if ( * s = = ' \0 ' )
return 0 ;
value [ 6 ] = strtol ( s , & error , 10 ) ;
if ( * error ! = ' \0 ' )
value [ 6 ] = 0 ;
return 1 ;
}
/* convert the version */
value [ 6 ] = strtol ( p , & error , 10 ) ;
if ( * error ! = ' \0 ' )
value [ 6 ] = 0 ;
return 1 ;
}
/* This function compares the current haproxy version with an arbitrary version
* string . It returns :
* - 1 : the version in argument is older than the current haproxy version
* 0 : the version in argument is the same as the current haproxy version
* 1 : the version in argument is newer than the current haproxy version
*
* Or some errors :
* - 2 : the current haproxy version is not parsable
* - 3 : the version in argument is not parsable
*/
int compare_current_version ( const char * version )
{
unsigned int loc [ VERSION_MAX_ELTS ] ;
unsigned int mod [ VERSION_MAX_ELTS ] ;
int i ;
/* split versions */
if ( ! split_version ( haproxy_version , loc ) )
return - 2 ;
if ( ! split_version ( version , mod ) )
return - 3 ;
/* compare versions */
for ( i = 0 ; i < VERSION_MAX_ELTS ; i + + ) {
if ( mod [ i ] < loc [ i ] )
return - 1 ;
else if ( mod [ i ] > loc [ i ] )
return 1 ;
}
return 0 ;
}
2023-09-05 09:24:39 -04:00
void display_version ( )
2006-06-25 20:48:02 -04:00
{
2020-04-18 10:02:47 -04:00
struct utsname utsname ;
2021-05-09 00:14:25 -04:00
printf ( " HAProxy version %s %s - https://haproxy.org/ \n "
2019-11-21 12:07:30 -05:00
PRODUCT_STATUS " \n " , haproxy_version , haproxy_date ) ;
2019-11-21 12:48:20 -05:00
if ( strlen ( PRODUCT_URL_BUGS ) > 0 ) {
char base_version [ 20 ] ;
int dots = 0 ;
char * del ;
/* only retrieve the base version without distro-specific extensions */
for ( del = haproxy_version ; * del ; del + + ) {
if ( * del = = ' . ' )
dots + + ;
else if ( * del < ' 0 ' | | * del > ' 9 ' )
break ;
}
strlcpy2 ( base_version , haproxy_version , del - haproxy_version + 1 ) ;
if ( dots < 2 )
printf ( " Known bugs: https://github.com/haproxy/haproxy/issues?q=is:issue+is:open \n " ) ;
else
printf ( " Known bugs: " PRODUCT_URL_BUGS " \n " , base_version ) ;
}
2020-04-18 10:02:47 -04:00
if ( uname ( & utsname ) = = 0 ) {
printf ( " Running on: %s %s %s %s \n " , utsname . sysname , utsname . release , utsname . version , utsname . machine ) ;
}
2006-06-25 20:48:02 -04:00
}
2016-12-21 12:19:57 -05:00
static void display_build_opts ( )
2007-12-02 05:28:59 -05:00
{
2024-01-02 04:56:05 -05:00
const char * * opt ;
2016-12-21 12:43:10 -05:00
2007-12-02 05:28:59 -05:00
printf ( " Build options : "
# ifdef BUILD_TARGET
2008-01-02 14:48:34 -05:00
" \n TARGET = " BUILD_TARGET
2007-12-02 05:28:59 -05:00
# endif
# ifdef BUILD_CC
2008-01-02 14:48:34 -05:00
" \n CC = " BUILD_CC
# endif
# ifdef BUILD_CFLAGS
" \n CFLAGS = " BUILD_CFLAGS
2007-12-02 05:28:59 -05:00
# endif
2008-01-02 14:48:34 -05:00
# ifdef BUILD_OPTIONS
" \n OPTIONS = " BUILD_OPTIONS
2019-03-27 08:20:08 -04:00
# endif
2020-11-21 12:07:59 -05:00
# ifdef BUILD_DEBUG
" \n DEBUG = " BUILD_DEBUG
# endif
2023-10-06 04:45:16 -04:00
" \n \n Feature list : %s "
2009-08-17 01:23:33 -04:00
" \n \n Default settings : "
2019-03-13 05:03:07 -04:00
" \n bufsize = %d, maxrewrite = %d, maxpollevents = %d "
2009-08-17 01:23:33 -04:00
" \n \n " ,
2023-10-06 04:45:16 -04:00
build_features , BUFSIZE , MAXREWRITE , MAX_POLL_EVENTS ) ;
2009-10-03 12:57:08 -04:00
2024-01-02 04:56:05 -05:00
for ( opt = NULL ; ( opt = hap_get_next_build_opt ( opt ) ) ; puts ( * opt ) )
;
2016-12-21 12:43:10 -05:00
2010-01-29 11:50:44 -05:00
putchar ( ' \n ' ) ;
2009-10-03 12:57:08 -04:00
list_pollers ( stdout ) ;
putchar ( ' \n ' ) ;
2018-04-10 08:37:32 -04:00
list_mux_proto ( stdout ) ;
putchar ( ' \n ' ) ;
2019-03-19 03:08:10 -04:00
list_services ( stdout ) ;
putchar ( ' \n ' ) ;
2016-03-07 06:46:38 -05:00
list_filters ( stdout ) ;
putchar ( ' \n ' ) ;
2007-12-02 05:28:59 -05:00
}
2006-06-25 20:48:02 -04:00
/*
* This function prints the command line usage and exits
*/
2016-12-21 12:19:57 -05:00
static void usage ( char * name )
2006-06-25 20:48:02 -04:00
{
display_version ( ) ;
fprintf ( stderr ,
2016-05-13 17:52:56 -04:00
" Usage : %s [-f <cfgfile|cfgdir>]* [ -vdV "
2006-06-25 20:48:02 -04:00
" D ] [ -n <maxconn> ] [ -N <maxpconn> ] \n "
2015-10-08 05:58:48 -04:00
" [ -p <pidfile> ] [ -m <max megs> ] [ -C <dir> ] [-- <cfgfile>*] \n "
2007-12-02 05:28:59 -05:00
" -v displays version ; -vv shows known build options. \n "
2006-06-25 20:48:02 -04:00
" -d enters debug mode ; -db only disables background mode. \n "
2022-02-23 09:20:53 -05:00
" -dM[<byte>,help,...] debug memory (default: poison with <byte>/0x50) \n "
2023-11-22 08:58:59 -05:00
" -dt activate traces on stderr \n "
2006-06-25 20:48:02 -04:00
" -V enters verbose mode (disables quiet mode) \n "
2011-09-10 13:26:56 -04:00
" -D goes daemon ; -C changes to <dir> before loading files. \n "
2017-06-01 11:38:50 -04:00
" -W master-worker mode. \n "
2017-11-20 09:58:35 -05:00
# if defined(USE_SYSTEMD)
" -Ws master-worker mode with systemd notify support. \n "
# endif
2006-06-25 20:48:02 -04:00
" -q quiet mode : don't display messages \n "
2009-06-22 10:02:30 -04:00
" -c check mode : only check config files and exit \n "
2021-06-05 18:50:22 -04:00
" -cc check condition : evaluate a condition and exit \n "
2019-03-13 05:03:07 -04:00
" -n sets the maximum total # of connections (uses ulimit -n) \n "
2006-06-25 20:48:02 -04:00
" -m limits the usable amount of memory (in MB) \n "
" -N sets the default, per-proxy maximum # of connections (%d) \n "
2010-09-23 12:30:22 -04:00
" -L set local peer name (default to hostname) \n "
2006-06-25 20:48:02 -04:00
" -p writes pids of all children to this file \n "
2022-09-29 04:34:04 -04:00
" -dC[[key],line] display the configuration file, if there is a key, the file will be anonymised \n "
2019-05-22 13:24:06 -04:00
# if defined(USE_EPOLL)
2006-06-25 20:48:02 -04:00
" -de disables epoll() usage even when available \n "
# endif
2019-05-22 13:24:06 -04:00
# if defined(USE_KQUEUE)
2007-04-09 06:03:06 -04:00
" -dk disables kqueue() usage even when available \n "
# endif
2019-05-22 13:24:06 -04:00
# if defined(USE_EVPORTS)
2019-04-08 12:53:32 -04:00
" -dv disables event ports usage even when available \n "
# endif
2019-05-22 13:24:06 -04:00
# if defined(USE_POLL)
2006-06-25 20:48:02 -04:00
" -dp disables poll() usage even when available \n "
2009-01-25 10:03:28 -05:00
# endif
2019-05-22 13:24:06 -04:00
# if defined(USE_LINUX_SPLICE)
2009-01-25 10:03:28 -05:00
" -dS disables splice usage (broken on old kernels) \n "
2014-04-14 09:56:58 -04:00
# endif
# if defined(USE_GETADDRINFO)
" -dG disables getaddrinfo() usage \n "
2016-09-12 17:42:20 -04:00
# endif
# if defined(SO_REUSEPORT)
" -dR disables SO_REUSEPORT usage \n "
2021-12-28 09:43:11 -05:00
# endif
# if defined(HA_HAVE_DUMP_LIBS)
" -dL dumps loaded object files after config checks \n "
2006-06-25 20:48:02 -04:00
# endif
MINOR: management: add some basic keyword dump infrastructure
It's difficult from outside haproxy to detect the supported keywords
and syntax. Interestingly, many of our modern keywords are enumerated
since they're registered from constructors, so it's not very hard to
enumerate most of them.
This patch creates some basic infrastructure to support dumping existing
keywords from different classes on stdout. The format will differ depending
on the classes, but the idea is that the output could easily be passed to
a script that generates some simple syntax highlighting rules, completion
rules for editors, syntax checkers or config parsers.
The principle chosen here is that if "-dK" is passed on the command-line,
at the end of the parsing the registered keywords will be dumped for the
requested classes passed after "-dK". Special name "help" will show known
classes, while "all" will execute all of them. The reason for doing that
after the end of the config processor is that it will also enumerate
internally-generated keywords, Lua or even those loaded from external
code (e.g. if an add-on is loaded using LD_PRELOAD). A typical way to
call this with a valid config would be:
./haproxy -dKall -q -c -f /path/to/config
If there's no config available, feeding /dev/null will also do the job,
though it will not be able to detect dynamically created keywords, of
course.
This patch also updates the management doc.
For now nothing but the help is listed, various subsystems will follow
in subsequent patches.
2022-03-08 10:01:40 -05:00
" -dK{class[,...]} dump registered keywords (use 'help' for list) \n "
2016-11-07 15:03:16 -05:00
" -dr ignores server address resolution failures \n "
2014-01-29 06:24:34 -05:00
" -dV disables SSL verify on servers side \n "
2020-04-15 10:42:39 -04:00
" -dW fails if any warning is emitted \n "
2021-03-29 04:29:07 -04:00
" -dD diagnostic mode : warn about suspicious configuration statements \n "
2023-02-14 10:12:54 -05:00
" -dF disable fast-forward \n "
2024-03-13 06:08:50 -04:00
" -dI enable insecure fork \n "
2023-10-16 12:28:59 -04:00
" -dZ disable zero-copy forwarding \n "
2015-10-08 05:32:32 -04:00
" -sf/-st [pid ]* finishes/terminates old pids. \n "
2017-04-05 16:33:04 -04:00
" -x <unix_socket> get listening sockets from a unix socket \n "
2019-06-13 11:03:37 -04:00
" -S <bind>[,<bind options>...] new master CLI \n "
2006-06-25 20:48:02 -04:00
" \n " ,
2019-03-13 05:03:07 -04:00
name , cfg_maxpconn ) ;
2006-06-25 20:48:02 -04:00
exit ( 1 ) ;
}
/*********************************************************************/
/* more specific functions ***************************************/
/*********************************************************************/
2017-06-01 11:38:51 -04:00
/* sends the signal <sig> to all pids found in <oldpids>. Returns the number of
* pids the signal was correctly delivered to .
*/
2019-04-01 05:29:56 -04:00
int tell_old_pids ( int sig )
2017-06-01 11:38:51 -04:00
{
int p ;
int ret = 0 ;
for ( p = 0 ; p < nb_oldpids ; p + + )
if ( kill ( oldpids [ p ] , sig ) = = 0 )
ret + + ;
return ret ;
}
/*
* remove a pid forom the olpid array and decrease nb_oldpids
* return 1 pid was found otherwise return 0
*/
int delete_oldpid ( int pid )
{
int i ;
for ( i = 0 ; i < nb_oldpids ; i + + ) {
if ( oldpids [ i ] = = pid ) {
oldpids [ i ] = oldpids [ nb_oldpids - 1 ] ;
oldpids [ nb_oldpids - 1 ] = 0 ;
nb_oldpids - - ;
return 1 ;
}
}
return 0 ;
}
2017-06-01 11:38:53 -04:00
2017-06-01 11:38:51 -04:00
/*
* When called , this function reexec haproxy with - sf followed by current
2018-11-15 13:41:50 -05:00
* children PIDs and possibly old children PIDs if they didn ' t leave yet .
2017-06-01 11:38:51 -04:00
*/
2023-11-24 15:20:32 -05:00
static void mworker_reexec ( int hardreload )
2017-06-01 11:38:51 -04:00
{
2020-06-05 08:08:41 -04:00
char * * next_argv = NULL ;
int old_argc = 0 ; /* previous number of argument */
2017-06-01 11:38:51 -04:00
int next_argc = 0 ;
2020-06-05 08:08:41 -04:00
int i = 0 ;
2017-06-01 11:38:51 -04:00
char * msg = NULL ;
2019-03-01 04:21:55 -05:00
struct rlimit limit ;
2021-11-24 12:45:37 -05:00
struct mworker_proc * current_child = NULL ;
2024-04-26 09:08:31 -04:00
int x_off = 0 ; /* disable -x by putting -x /dev/null */
2017-06-01 11:38:51 -04:00
mworker_block_signals ( ) ;
setenv ( " HAPROXY_MWORKER_REEXEC " , " 1 " , 1 ) ;
2022-01-28 15:17:30 -05:00
mworker_cleanup_proc ( ) ;
2018-09-11 04:06:26 -04:00
mworker_proc_list_to_env ( ) ; /* put the children description in the env */
2021-11-18 04:51:30 -05:00
/* ensure that we close correctly every listeners before reexecuting */
mworker_cleanlisteners ( ) ;
2018-11-26 05:53:40 -05:00
/* during the reload we must ensure that every FDs that can't be
* reuse ( ie those that are not referenced in the proc_list )
* are closed or they will leak . */
/* close the listeners FD */
mworker_cli_proxy_stop ( ) ;
2019-06-24 11:40:48 -04:00
2021-11-25 04:03:44 -05:00
if ( fdtab )
deinit_pollers ( ) ;
2021-11-26 08:43:57 -05:00
2021-02-19 13:42:53 -05:00
# ifdef HAVE_SSL_RAND_KEEP_RANDOM_DEVICES_OPEN
2019-10-15 08:04:08 -04:00
/* close random device FDs */
RAND_keep_random_devices_open ( 0 ) ;
2019-05-03 04:11:32 -04:00
# endif
2018-11-26 05:53:40 -05:00
2019-03-01 04:21:55 -05:00
/* restore the initial FD limits */
limit . rlim_cur = rlim_fd_cur_at_boot ;
limit . rlim_max = rlim_fd_max_at_boot ;
2022-09-22 10:12:08 -04:00
if ( raise_rlim_nofile ( & limit , & limit ) ! = 0 ) {
2019-03-01 04:21:55 -05:00
ha_warning ( " Failed to restore initial FD limits (cur=%u max=%u), using cur=%u max=%u \n " ,
rlim_fd_cur_at_boot , rlim_fd_max_at_boot ,
( unsigned int ) limit . rlim_cur , ( unsigned int ) limit . rlim_max ) ;
}
2017-06-01 11:38:51 -04:00
/* compute length */
2020-06-05 08:08:41 -04:00
while ( old_argv [ old_argc ] )
old_argc + + ;
2017-06-01 11:38:51 -04:00
2017-06-01 11:38:53 -04:00
/* 1 for haproxy -sf, 2 for -x /socket */
2021-04-21 10:55:34 -04:00
next_argv = calloc ( old_argc + 1 + 2 + mworker_child_nb ( ) + 1 ,
2020-09-12 14:26:43 -04:00
sizeof ( * next_argv ) ) ;
2017-06-01 11:38:51 -04:00
if ( next_argv = = NULL )
goto alloc_error ;
2020-06-05 08:08:41 -04:00
/* copy the program name */
next_argv [ next_argc + + ] = old_argv [ 0 ] ;
2024-04-30 10:11:27 -04:00
/* we need to reintroduce /dev/null every time */
2024-04-26 09:08:31 -04:00
if ( old_unixsocket & & strcmp ( old_unixsocket , " /dev/null " ) = = 0 )
x_off = 1 ;
2020-06-05 08:08:41 -04:00
/* insert the new options just after argv[0] in case we have a -- */
2021-11-24 12:45:37 -05:00
if ( getenv ( " HAPROXY_MWORKER_WAIT_ONLY " ) = = NULL ) {
2021-11-24 18:49:19 -05:00
/* add -sf <PID>* to argv */
if ( mworker_child_nb ( ) > 0 ) {
struct mworker_proc * child ;
2023-11-24 15:20:32 -05:00
if ( hardreload )
next_argv [ next_argc + + ] = " -st " ;
else
next_argv [ next_argc + + ] = " -sf " ;
2021-11-24 18:49:19 -05:00
list_for_each_entry ( child , & proc_list , list ) {
if ( ! ( child - > options & PROC_O_LEAVING ) & & ( child - > options & PROC_O_TYPE_WORKER ) )
current_child = child ;
if ( ! ( child - > options & ( PROC_O_TYPE_WORKER | PROC_O_TYPE_PROG ) ) | | child - > pid < = - 1 )
continue ;
if ( ( next_argv [ next_argc + + ] = memprintf ( & msg , " %d " , child - > pid ) ) = = NULL )
goto alloc_error ;
msg = NULL ;
}
}
2024-04-26 09:08:31 -04:00
if ( ! x_off & & current_child ) {
2021-11-24 12:45:37 -05:00
/* add the -x option with the socketpair of the current worker */
next_argv [ next_argc + + ] = " -x " ;
if ( ( next_argv [ next_argc + + ] = memprintf ( & msg , " sockpair@%d " , current_child - > ipc_fd [ 0 ] ) ) = = NULL )
goto alloc_error ;
msg = NULL ;
}
2017-06-01 11:38:53 -04:00
}
2024-04-26 09:08:31 -04:00
if ( x_off ) {
/* if the cmdline contained a -x /dev/null, continue to use it */
next_argv [ next_argc + + ] = " -x " ;
next_argv [ next_argc + + ] = " /dev/null " ;
}
2020-06-05 08:08:41 -04:00
/* copy the previous options */
for ( i = 1 ; i < old_argc ; i + + )
next_argv [ next_argc + + ] = old_argv [ i ] ;
2019-08-26 04:37:39 -04:00
signal ( SIGPROF , SIG_IGN ) ;
2017-11-12 11:39:18 -05:00
execvp ( next_argv [ 0 ] , next_argv ) ;
2017-11-24 10:50:31 -05:00
ha_warning ( " Failed to reexecute the master process [%d]: %s \n " , pid , strerror ( errno ) ) ;
2021-02-20 04:46:51 -05:00
ha_free ( & next_argv ) ;
2017-11-15 13:02:55 -05:00
return ;
2017-06-01 11:38:51 -04:00
alloc_error :
2021-02-20 04:46:51 -05:00
ha_free ( & next_argv ) ;
2018-11-15 13:43:05 -05:00
ha_warning ( " Failed to reexecute the master process [%d]: Cannot allocate memory \n " , pid ) ;
2017-06-01 11:38:51 -04:00
return ;
}
2021-11-09 12:01:22 -05:00
/* reexec haproxy in waitmode */
static void mworker_reexec_waitmode ( )
{
setenv ( " HAPROXY_MWORKER_WAIT_ONLY " , " 1 " , 1 ) ;
2023-11-24 15:20:32 -05:00
mworker_reexec ( 0 ) ;
2021-11-09 12:01:22 -05:00
}
/* reload haproxy and emit a warning */
2023-11-24 15:20:32 -05:00
void mworker_reload ( int hardreload )
2021-11-09 12:01:22 -05:00
{
2021-11-09 12:43:59 -05:00
struct mworker_proc * child ;
2021-11-26 08:43:57 -05:00
struct per_thread_deinit_fct * ptdf ;
2021-11-09 12:43:59 -05:00
2023-11-24 15:20:32 -05:00
ha_notice ( " Reloading HAProxy%s \n " , hardreload ? " (hard-reload) " : " " ) ;
2021-11-09 12:43:59 -05:00
2021-11-26 08:43:57 -05:00
/* close the poller FD and the thread waker pipe FD */
list_for_each_entry ( ptdf , & per_thread_deinit_list , list )
ptdf - > fct ( ) ;
2021-11-09 12:43:59 -05:00
/* increment the number of reloads */
list_for_each_entry ( child , & proc_list , list ) {
child - > reloads + + ;
}
2022-07-07 08:00:36 -04:00
# if defined(USE_SYSTEMD)
2024-04-03 16:39:16 -04:00
if ( global . tune . options & GTUNE_USE_SYSTEMD ) {
struct timespec ts ;
( void ) clock_gettime ( CLOCK_MONOTONIC , & ts ) ;
sd_notifyf ( 0 ,
" RELOADING=1 \n "
" STATUS=Reloading Configuration. \n "
" MONOTONIC_USEC=% " PRIu64 " \n " ,
( ts . tv_sec * 1000000ULL + ts . tv_nsec / 1000ULL ) ) ;
}
2022-07-07 08:00:36 -04:00
# endif
2023-11-24 15:20:32 -05:00
mworker_reexec ( hardreload ) ;
2021-11-09 12:01:22 -05:00
}
2018-09-11 04:06:18 -04:00
static void mworker_loop ( )
{
2019-04-18 05:31:36 -04:00
/* Busy polling makes no sense in the master :-) */
global . tune . options & = ~ GTUNE_BUSY_POLLING ;
2018-09-11 04:06:18 -04:00
2018-09-11 04:06:26 -04:00
2019-12-11 08:24:07 -05:00
signal_unregister ( SIGTTIN ) ;
signal_unregister ( SIGTTOU ) ;
2018-11-20 11:36:53 -05:00
signal_unregister ( SIGUSR1 ) ;
signal_unregister ( SIGHUP ) ;
signal_unregister ( SIGQUIT ) ;
2018-09-11 04:06:18 -04:00
signal_register_fct ( SIGTERM , mworker_catch_sigterm , SIGTERM ) ;
signal_register_fct ( SIGUSR1 , mworker_catch_sigterm , SIGUSR1 ) ;
2019-12-11 08:24:07 -05:00
signal_register_fct ( SIGTTIN , mworker_broadcast_signal , SIGTTIN ) ;
signal_register_fct ( SIGTTOU , mworker_broadcast_signal , SIGTTOU ) ;
2018-09-11 04:06:18 -04:00
signal_register_fct ( SIGINT , mworker_catch_sigterm , SIGINT ) ;
signal_register_fct ( SIGHUP , mworker_catch_sighup , SIGHUP ) ;
signal_register_fct ( SIGUSR2 , mworker_catch_sighup , SIGUSR2 ) ;
signal_register_fct ( SIGCHLD , mworker_catch_sigchld , SIGCHLD ) ;
mworker_unblock_signals ( ) ;
2018-12-06 08:05:20 -05:00
mworker_cleantasks ( ) ;
2018-09-11 04:06:18 -04:00
2018-09-11 04:06:26 -04:00
mworker_catch_sigchld ( NULL ) ; /* ensure we clean the children in case
some SIGCHLD were lost */
2018-09-11 04:06:18 -04:00
jobs + + ; /* this is the "master" job, we want to take care of the
signals even if there is no listener so the poll loop don ' t
leave */
fork_poller ( ) ;
2021-09-28 03:43:11 -04:00
run_thread_poll_loop ( NULL ) ;
2018-09-11 04:06:18 -04:00
}
2017-06-01 11:38:51 -04:00
2017-06-01 11:38:52 -04:00
/*
* Reexec the process in failure mode , instead of exiting
*/
void reexec_on_failure ( )
{
2021-11-10 04:49:06 -05:00
struct mworker_proc * child ;
2017-06-01 11:38:52 -04:00
if ( ! atexit_flag )
return ;
2021-11-10 04:49:06 -05:00
/* get the info of the children in the env */
if ( mworker_env_to_proc_list ( ) < 0 ) {
exit ( EXIT_FAILURE ) ;
}
/* increment the number of failed reloads */
list_for_each_entry ( child , & proc_list , list ) {
child - > failedreloads + + ;
}
BUG/MEDIUM: mworker: close unused transferred FDs on load failure
When the master process is reloaded on a new config, it will try to
connect to the previous process' socket to retrieve all known
listening FDs to be reused by the new listeners. If listeners were
removed, their unused FDs are simply closed.
However there's a catch. In case a socket fails to bind, the master
will cancel its startup and swithc to wait mode for a new operation
to happen. In this case it didn't close the possibly remaining FDs
that were left unused.
It is very hard to hit this case, but it can happen during a
troubleshooting session with fat fingers. For example, let's say
a config runs like this:
frontend ftp
bind 1.2.3.4:20000-29999
The admin wants to extend the port range down to 10000-29999 and
by mistake ends up with:
frontend ftp
bind 1.2.3.41:20000-29999
Upon restart the bind will fail if the address is not present, and the
master will then switch to wait mode without releasing the previous FDs
for 1.2.3.4:20000-29999 since they're now apparently unused. Then once
the admin fixes the config and does:
frontend ftp
bind 1.2.3.4:10000-29999
The service will start, but will bind new sockets, half of them
overlapping with the previous ones that were not properly closed. This
may result in a startup error (if SO_REUSEPORT is not enabled or not
available), in a FD number exhaustion (if the error is repeated many
times), or in connections being randomly accepted by the process if
they sometimes land on the old FD that nobody listens on.
This patch will need to be backported as far as 1.8, and depends on
previous patch:
MINOR: sock: move the unused socket cleaning code into its own function
Note that before 2.3 most of the code was located inside haproxy.c, so
the patch above should probably relocate the function there instead of
sock.c.
2022-01-28 12:40:06 -05:00
/* do not keep unused FDs retrieved from the previous process */
sock_drop_unused_old_sockets ( ) ;
2021-11-09 12:01:22 -05:00
usermsgs_clr ( NULL ) ;
2022-09-24 09:44:42 -04:00
setenv ( " HAPROXY_LOAD_SUCCESS " , " 0 " , 1 ) ;
2021-11-09 12:16:47 -05:00
ha_warning ( " Loading failure! \n " ) ;
2022-07-07 08:00:36 -04:00
# if defined(USE_SYSTEMD)
/* the sd_notify API is not able to send a reload failure signal. So
* the READY = 1 signal still need to be sent */
if ( global . tune . options & GTUNE_USE_SYSTEMD )
sd_notify ( 0 , " READY=1 \n STATUS=Reload failed! \n " ) ;
# endif
2021-11-09 12:01:22 -05:00
mworker_reexec_waitmode ( ) ;
2017-06-01 11:38:52 -04:00
}
2022-12-07 09:03:55 -05:00
/*
* Exit with an error message upon a wait - mode failure .
*/
void exit_on_waitmode_failure ( )
{
if ( ! atexit_flag )
return ;
ha_alert ( " Non-recoverable mworker wait-mode error, exiting. \n " ) ;
}
2017-06-01 11:38:51 -04:00
2006-06-25 20:48:02 -04:00
/*
2010-08-27 12:26:11 -04:00
* upon SIGUSR1 , let ' s have a soft stop . Note that soft_stop ( ) broadcasts
* a signal zero to all subscribers . This means that it ' s as easy as
* subscribing to signal 0 to get informed about an imminent shutdown .
2006-06-25 20:48:02 -04:00
*/
2016-12-21 12:19:57 -05:00
static void sig_soft_stop ( struct sig_handler * sh )
2006-06-25 20:48:02 -04:00
{
soft_stop ( ) ;
2010-08-27 11:56:48 -04:00
signal_unregister_handler ( sh ) ;
2017-11-24 11:34:44 -05:00
pool_gc ( NULL ) ;
2006-06-25 20:48:02 -04:00
}
/*
* upon SIGTTOU , we pause everything
*/
2016-12-21 12:19:57 -05:00
static void sig_pause ( struct sig_handler * sh )
2006-06-25 20:48:02 -04:00
{
2020-09-24 10:36:26 -04:00
if ( protocol_pause_all ( ) & ERR_FATAL ) {
const char * msg = " Some proxies refused to pause, performing soft stop now. \n " ;
2020-10-09 13:26:27 -04:00
ha_warning ( " %s " , msg ) ;
send_log ( NULL , LOG_WARNING , " %s " , msg ) ;
2020-09-24 10:36:26 -04:00
soft_stop ( ) ;
}
2017-11-24 11:34:44 -05:00
pool_gc ( NULL ) ;
2006-06-25 20:48:02 -04:00
}
/*
* upon SIGTTIN , let ' s have a soft stop .
*/
2016-12-21 12:19:57 -05:00
static void sig_listen ( struct sig_handler * sh )
2006-06-25 20:48:02 -04:00
{
2020-09-24 10:36:26 -04:00
if ( protocol_resume_all ( ) & ERR_FATAL ) {
const char * msg = " Some proxies refused to resume, probably due to a conflict on a listening port. You may want to try again after the conflicting application is stopped, otherwise a restart might be needed to resume safe operations. \n " ;
2020-10-09 13:26:27 -04:00
ha_warning ( " %s " , msg ) ;
send_log ( NULL , LOG_WARNING , " %s " , msg ) ;
2020-09-24 10:36:26 -04:00
}
2006-06-25 20:48:02 -04:00
}
/*
* this function dumps every server ' s state when the process receives SIGHUP .
*/
2016-12-21 12:19:57 -05:00
static void sig_dump_state ( struct sig_handler * sh )
2006-06-25 20:48:02 -04:00
{
2017-11-24 10:54:05 -05:00
struct proxy * p = proxies_list ;
2006-06-25 20:48:02 -04:00
2017-11-24 10:50:31 -05:00
ha_warning ( " SIGHUP received, dumping servers states. \n " ) ;
2006-06-25 20:48:02 -04:00
while ( p ) {
struct server * s = p - > srv ;
send_log ( p , LOG_NOTICE , " SIGHUP received, dumping servers states for proxy %s. \n " , p - > id ) ;
while ( s ) {
2012-10-29 11:51:55 -04:00
chunk_printf ( & trash ,
" SIGHUP: Server %s/%s is %s. Conn: %d act, %d pend, %lld tot. " ,
p - > id , s - > id ,
2017-08-31 08:41:55 -04:00
( s - > cur_state ! = SRV_ST_STOPPED ) ? " UP " : " DOWN " ,
2021-06-18 03:30:30 -04:00
s - > cur_sess , s - > queue . length , s - > counters . cum_sess ) ;
2018-07-13 04:54:26 -04:00
ha_warning ( " %s \n " , trash . area ) ;
send_log ( p , LOG_NOTICE , " %s \n " , trash . area ) ;
2006-06-25 20:48:02 -04:00
s = s - > next ;
}
2007-09-17 05:27:09 -04:00
/* FIXME: those info are a bit outdated. We should be able to distinguish between FE and BE. */
if ( ! p - > srv ) {
2012-10-29 11:51:55 -04:00
chunk_printf ( & trash ,
" SIGHUP: Proxy %s has no servers. Conn: act(FE+BE): %d+%d, %d pend (%d unass), tot(FE+BE): %lld+%lld. " ,
p - > id ,
2024-04-04 12:08:46 -04:00
p - > feconn , p - > beconn , p - > totpend , p - > queue . length , p - > fe_counters . cum_conn , p - > be_counters . cum_sess ) ;
2007-09-17 05:27:09 -04:00
} else if ( p - > srv_act = = 0 ) {
2012-10-29 11:51:55 -04:00
chunk_printf ( & trash ,
" SIGHUP: Proxy %s %s ! Conn: act(FE+BE): %d+%d, %d pend (%d unass), tot(FE+BE): %lld+%lld. " ,
p - > id ,
( p - > srv_bck ) ? " is running on backup servers " : " has no server available " ,
2024-04-04 12:08:46 -04:00
p - > feconn , p - > beconn , p - > totpend , p - > queue . length , p - > fe_counters . cum_conn , p - > be_counters . cum_sess ) ;
2006-06-25 20:48:02 -04:00
} else {
2012-10-29 11:51:55 -04:00
chunk_printf ( & trash ,
" SIGHUP: Proxy %s has %d active servers and %d backup servers available. "
" Conn: act(FE+BE): %d+%d, %d pend (%d unass), tot(FE+BE): %lld+%lld. " ,
p - > id , p - > srv_act , p - > srv_bck ,
2024-04-04 12:08:46 -04:00
p - > feconn , p - > beconn , p - > totpend , p - > queue . length , p - > fe_counters . cum_conn , p - > be_counters . cum_sess ) ;
2006-06-25 20:48:02 -04:00
}
2018-07-13 04:54:26 -04:00
ha_warning ( " %s \n " , trash . area ) ;
send_log ( p , LOG_NOTICE , " %s \n " , trash . area ) ;
2006-06-25 20:48:02 -04:00
p = p - > next ;
}
}
2016-12-21 12:19:57 -05:00
static void dump ( struct sig_handler * sh )
2006-06-25 20:48:02 -04:00
{
2007-05-13 13:43:47 -04:00
/* dump memory usage then free everything possible */
dump_pools ( ) ;
2017-11-24 11:34:44 -05:00
pool_gc ( NULL ) ;
2006-06-25 20:48:02 -04:00
}
2017-12-28 10:09:36 -05:00
/*
* This function dup2 the stdio FDs ( 0 , 1 , 2 ) with < fd > , then closes < fd >
* If < fd > < 0 , it opens / dev / null and use it to dup
*
* In the case of chrooting , you have to open / dev / null before the chroot , and
* pass the < fd > to this function
*/
static void stdio_quiet ( int fd )
{
if ( fd < 0 )
fd = open ( " /dev/null " , O_RDWR , 0 ) ;
if ( fd > - 1 ) {
fclose ( stdin ) ;
fclose ( stdout ) ;
fclose ( stderr ) ;
dup2 ( fd , 0 ) ;
dup2 ( fd , 1 ) ;
dup2 ( fd , 2 ) ;
if ( fd > 2 )
close ( fd ) ;
return ;
}
ha_alert ( " Cannot open /dev/null \n " ) ;
exit ( EXIT_FAILURE ) ;
}
2018-11-15 13:41:50 -05:00
/* This function checks if cfg_cfgfiles contains directories.
* If it finds one , it adds all the files ( and only files ) it contains
* in cfg_cfgfiles in place of the directory ( and removes the directory ) .
* It adds the files in lexical order .
* It adds only files with . cfg extension .
2016-05-13 17:52:56 -04:00
* It doesn ' t add files with name starting with ' . '
*/
2016-12-21 12:19:57 -05:00
static void cfgfiles_expand_directories ( void )
2016-05-13 17:52:56 -04:00
{
2024-08-07 12:20:43 -04:00
struct cfgfile * cfg , * cfg_tmp ;
2016-05-13 17:52:56 -04:00
char * err = NULL ;
2024-08-07 12:20:43 -04:00
list_for_each_entry_safe ( cfg , cfg_tmp , & cfg_cfgfiles , list ) {
2016-05-13 17:52:56 -04:00
struct stat file_stat ;
struct dirent * * dir_entries = NULL ;
int dir_entries_nb ;
int dir_entries_it ;
2024-08-07 12:20:43 -04:00
if ( stat ( cfg - > filename , & file_stat ) ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " Cannot open configuration file/directory %s : %s \n " ,
2024-08-07 12:20:43 -04:00
cfg - > filename ,
2017-11-24 10:50:31 -05:00
strerror ( errno ) ) ;
2016-05-13 17:52:56 -04:00
exit ( 1 ) ;
}
if ( ! S_ISDIR ( file_stat . st_mode ) )
continue ;
2024-08-07 12:20:43 -04:00
/* from this point cfg->name is a directory */
2016-05-13 17:52:56 -04:00
2024-08-07 12:20:43 -04:00
dir_entries_nb = scandir ( cfg - > filename , & dir_entries , NULL , alphasort ) ;
2016-05-13 17:52:56 -04:00
if ( dir_entries_nb < 0 ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " Cannot open configuration directory %s : %s \n " ,
2024-08-07 12:20:43 -04:00
cfg - > filename ,
2017-11-24 10:50:31 -05:00
strerror ( errno ) ) ;
2016-05-13 17:52:56 -04:00
exit ( 1 ) ;
}
2024-08-07 12:20:43 -04:00
/* for each element in the directory cfg->name */
2016-05-13 17:52:56 -04:00
for ( dir_entries_it = 0 ; dir_entries_it < dir_entries_nb ; dir_entries_it + + ) {
struct dirent * dir_entry = dir_entries [ dir_entries_it ] ;
char * filename = NULL ;
char * d_name_cfgext = strstr ( dir_entry - > d_name , " .cfg " ) ;
/* don't add filename that begin with .
2018-11-15 13:41:50 -05:00
* only add filename with . cfg extension
2016-05-13 17:52:56 -04:00
*/
if ( dir_entry - > d_name [ 0 ] = = ' . ' | |
! ( d_name_cfgext & & d_name_cfgext [ 4 ] = = ' \0 ' ) )
goto next_dir_entry ;
2024-08-07 12:20:43 -04:00
if ( ! memprintf ( & filename , " %s/%s " , cfg - > filename , dir_entry - > d_name ) ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " Cannot load configuration files %s : out of memory. \n " ,
filename ) ;
2016-05-13 17:52:56 -04:00
exit ( 1 ) ;
}
if ( stat ( filename , & file_stat ) ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " Cannot open configuration file %s : %s \n " ,
2024-08-07 12:20:43 -04:00
cfg - > filename ,
2017-11-24 10:50:31 -05:00
strerror ( errno ) ) ;
2016-05-13 17:52:56 -04:00
exit ( 1 ) ;
}
/* don't add anything else than regular file in cfg_cfgfiles
* this way we avoid loops
*/
if ( ! S_ISREG ( file_stat . st_mode ) )
goto next_dir_entry ;
2024-08-07 12:20:43 -04:00
if ( ! list_append_cfgfile ( & cfg - > list , filename , & err ) ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " Cannot load configuration files %s : %s \n " ,
filename ,
err ) ;
2016-05-13 17:52:56 -04:00
exit ( 1 ) ;
}
next_dir_entry :
free ( filename ) ;
free ( dir_entry ) ;
}
free ( dir_entries ) ;
2024-08-07 12:20:43 -04:00
/* remove the current directory (cfg) from cfgfiles */
free ( cfg - > filename ) ;
LIST_DELETE ( & cfg - > list ) ;
free ( cfg ) ;
2016-05-13 17:52:56 -04:00
}
free ( err ) ;
}
MEDIUM: startup: make read_cfg() return immediately on ENOMEM
This commit prepares read_cfg() to call load_cfg_in_mem() helper in order to
load configuration files in memory. Before, read_cfg() calls the parser for all
files from cfg_cfgfiles list and cumulates parser's errors and memprintf's
errors in for_each loop. memprintf's errors did not stop this loop and were
accounted just after.
Now, as we plan to load configuration files in memory, we stop the loop, if
memprintf() fails, and we show appropraite error message with ha_alert. Then
process terminates. So not all cumulated syntax-related errors will be shown
before exit in this case and we has to stop, because we run out of memory.
If we can't open the current file or we fail to allocate a memory to store
some configuration line, the previous behaviour is kept, process emits
appropriate alert message and exits.
If parser returns some syntax-related error on the current file, the previous
behaviour is kept as well. We cumulate such errors for all parsed files and we
check them just after the loop. All syntax-related errors for all files is
shown then as before in ha_alert messages line by line during the startup.
Then process will exit with 1.
As now cfg_cfgfiles list contains many pointers to some memory areas with
configuration files content and this content could be big, it's better to
free the list explicitly, when parsing was finished. So, let's change
read_cfg() to return some integer value to its caller init(), and let's perform
the free routine at a caller level, as cfg_cfgfiles list was initialized and
initially filled at this level.
2024-08-05 04:03:52 -04:00
/* Reads config files. Returns -1, if we are run out of memory,
2024-06-26 11:03:04 -04:00
* couldn ' t open provided file ( s ) or parser has detected some fatal error .
* Otherwise , returns an err_code , which may contain 0 ( OK ) or ERR_WARN ,
MEDIUM: startup: make read_cfg() return immediately on ENOMEM
This commit prepares read_cfg() to call load_cfg_in_mem() helper in order to
load configuration files in memory. Before, read_cfg() calls the parser for all
files from cfg_cfgfiles list and cumulates parser's errors and memprintf's
errors in for_each loop. memprintf's errors did not stop this loop and were
accounted just after.
Now, as we plan to load configuration files in memory, we stop the loop, if
memprintf() fails, and we show appropraite error message with ha_alert. Then
process terminates. So not all cumulated syntax-related errors will be shown
before exit in this case and we has to stop, because we run out of memory.
If we can't open the current file or we fail to allocate a memory to store
some configuration line, the previous behaviour is kept, process emits
appropriate alert message and exits.
If parser returns some syntax-related error on the current file, the previous
behaviour is kept as well. We cumulate such errors for all parsed files and we
check them just after the loop. All syntax-related errors for all files is
shown then as before in ha_alert messages line by line during the startup.
Then process will exit with 1.
As now cfg_cfgfiles list contains many pointers to some memory areas with
configuration files content and this content could be big, it's better to
free the list explicitly, when parsing was finished. So, let's change
read_cfg() to return some integer value to its caller init(), and let's perform
the free routine at a caller level, as cfg_cfgfiles list was initialized and
initially filled at this level.
2024-08-05 04:03:52 -04:00
* ERR_ALERT . It is used in further initialization stages .
2024-06-26 11:03:04 -04:00
*/
static int read_cfg ( char * progname )
{
char * env_cfgfiles = NULL ;
2024-08-07 10:53:50 -04:00
struct cfgfile * cfg , * cfg_tmp ;
2024-06-26 11:03:04 -04:00
int err_code = 0 ;
/* handle cfgfiles that are actually directories */
cfgfiles_expand_directories ( ) ;
if ( LIST_ISEMPTY ( & cfg_cfgfiles ) )
usage ( progname ) ;
/* temporary create environment variables with default
* values to ease user configuration . Do not forget to
* unset them after the list_for_each_entry loop .
*/
setenv ( " HAPROXY_HTTP_LOG_FMT " , default_http_log_format , 1 ) ;
setenv ( " HAPROXY_HTTPS_LOG_FMT " , default_https_log_format , 1 ) ;
setenv ( " HAPROXY_TCP_LOG_FMT " , default_tcp_log_format , 1 ) ;
setenv ( " HAPROXY_BRANCH " , PRODUCT_BRANCH , 1 ) ;
2024-08-07 10:53:50 -04:00
list_for_each_entry_safe ( cfg , cfg_tmp , & cfg_cfgfiles , list ) {
2024-06-26 11:03:04 -04:00
int ret ;
2024-08-07 10:53:50 -04:00
cfg - > size = load_cfg_in_mem ( cfg - > filename , & cfg - > content ) ;
if ( cfg - > size < 0 )
goto err ;
MEDIUM: startup: make read_cfg() return immediately on ENOMEM
This commit prepares read_cfg() to call load_cfg_in_mem() helper in order to
load configuration files in memory. Before, read_cfg() calls the parser for all
files from cfg_cfgfiles list and cumulates parser's errors and memprintf's
errors in for_each loop. memprintf's errors did not stop this loop and were
accounted just after.
Now, as we plan to load configuration files in memory, we stop the loop, if
memprintf() fails, and we show appropraite error message with ha_alert. Then
process terminates. So not all cumulated syntax-related errors will be shown
before exit in this case and we has to stop, because we run out of memory.
If we can't open the current file or we fail to allocate a memory to store
some configuration line, the previous behaviour is kept, process emits
appropriate alert message and exits.
If parser returns some syntax-related error on the current file, the previous
behaviour is kept as well. We cumulate such errors for all parsed files and we
check them just after the loop. All syntax-related errors for all files is
shown then as before in ha_alert messages line by line during the startup.
Then process will exit with 1.
As now cfg_cfgfiles list contains many pointers to some memory areas with
configuration files content and this content could be big, it's better to
free the list explicitly, when parsing was finished. So, let's change
read_cfg() to return some integer value to its caller init(), and let's perform
the free routine at a caller level, as cfg_cfgfiles list was initialized and
initially filled at this level.
2024-08-05 04:03:52 -04:00
if ( ! memprintf ( & env_cfgfiles , " %s%s%s " ,
( env_cfgfiles ? env_cfgfiles : " " ) ,
( env_cfgfiles ? " ; " : " " ) , cfg - > filename ) ) {
/* free what we've already allocated and free cfglist */
ha_alert ( " Could not allocate memory for HAPROXY_CFGFILES env variable \n " ) ;
goto err ;
2024-06-26 11:03:04 -04:00
}
2024-08-05 04:04:03 -04:00
ret = parse_cfg ( cfg ) ;
MEDIUM: startup: make read_cfg() return immediately on ENOMEM
This commit prepares read_cfg() to call load_cfg_in_mem() helper in order to
load configuration files in memory. Before, read_cfg() calls the parser for all
files from cfg_cfgfiles list and cumulates parser's errors and memprintf's
errors in for_each loop. memprintf's errors did not stop this loop and were
accounted just after.
Now, as we plan to load configuration files in memory, we stop the loop, if
memprintf() fails, and we show appropraite error message with ha_alert. Then
process terminates. So not all cumulated syntax-related errors will be shown
before exit in this case and we has to stop, because we run out of memory.
If we can't open the current file or we fail to allocate a memory to store
some configuration line, the previous behaviour is kept, process emits
appropriate alert message and exits.
If parser returns some syntax-related error on the current file, the previous
behaviour is kept as well. We cumulate such errors for all parsed files and we
check them just after the loop. All syntax-related errors for all files is
shown then as before in ha_alert messages line by line during the startup.
Then process will exit with 1.
As now cfg_cfgfiles list contains many pointers to some memory areas with
configuration files content and this content could be big, it's better to
free the list explicitly, when parsing was finished. So, let's change
read_cfg() to return some integer value to its caller init(), and let's perform
the free routine at a caller level, as cfg_cfgfiles list was initialized and
initially filled at this level.
2024-08-05 04:03:52 -04:00
if ( ret = = - 1 )
goto err ;
2024-06-26 11:03:04 -04:00
if ( ret & ( ERR_ABORT | ERR_FATAL ) )
2024-08-07 12:20:43 -04:00
ha_alert ( " Error(s) found in configuration file : %s \n " , cfg - > filename ) ;
2024-06-26 11:03:04 -04:00
err_code | = ret ;
MEDIUM: startup: make read_cfg() return immediately on ENOMEM
This commit prepares read_cfg() to call load_cfg_in_mem() helper in order to
load configuration files in memory. Before, read_cfg() calls the parser for all
files from cfg_cfgfiles list and cumulates parser's errors and memprintf's
errors in for_each loop. memprintf's errors did not stop this loop and were
accounted just after.
Now, as we plan to load configuration files in memory, we stop the loop, if
memprintf() fails, and we show appropraite error message with ha_alert. Then
process terminates. So not all cumulated syntax-related errors will be shown
before exit in this case and we has to stop, because we run out of memory.
If we can't open the current file or we fail to allocate a memory to store
some configuration line, the previous behaviour is kept, process emits
appropriate alert message and exits.
If parser returns some syntax-related error on the current file, the previous
behaviour is kept as well. We cumulate such errors for all parsed files and we
check them just after the loop. All syntax-related errors for all files is
shown then as before in ha_alert messages line by line during the startup.
Then process will exit with 1.
As now cfg_cfgfiles list contains many pointers to some memory areas with
configuration files content and this content could be big, it's better to
free the list explicitly, when parsing was finished. So, let's change
read_cfg() to return some integer value to its caller init(), and let's perform
the free routine at a caller level, as cfg_cfgfiles list was initialized and
initially filled at this level.
2024-08-05 04:03:52 -04:00
if ( err_code & ERR_ABORT )
goto err ;
2024-06-26 11:03:04 -04:00
}
/* remove temporary environment variables. */
unsetenv ( " HAPROXY_BRANCH " ) ;
unsetenv ( " HAPROXY_HTTP_LOG_FMT " ) ;
unsetenv ( " HAPROXY_HTTPS_LOG_FMT " ) ;
unsetenv ( " HAPROXY_TCP_LOG_FMT " ) ;
/* do not try to resolve arguments nor to spot inconsistencies when
MEDIUM: startup: make read_cfg() return immediately on ENOMEM
This commit prepares read_cfg() to call load_cfg_in_mem() helper in order to
load configuration files in memory. Before, read_cfg() calls the parser for all
files from cfg_cfgfiles list and cumulates parser's errors and memprintf's
errors in for_each loop. memprintf's errors did not stop this loop and were
accounted just after.
Now, as we plan to load configuration files in memory, we stop the loop, if
memprintf() fails, and we show appropraite error message with ha_alert. Then
process terminates. So not all cumulated syntax-related errors will be shown
before exit in this case and we has to stop, because we run out of memory.
If we can't open the current file or we fail to allocate a memory to store
some configuration line, the previous behaviour is kept, process emits
appropriate alert message and exits.
If parser returns some syntax-related error on the current file, the previous
behaviour is kept as well. We cumulate such errors for all parsed files and we
check them just after the loop. All syntax-related errors for all files is
shown then as before in ha_alert messages line by line during the startup.
Then process will exit with 1.
As now cfg_cfgfiles list contains many pointers to some memory areas with
configuration files content and this content could be big, it's better to
free the list explicitly, when parsing was finished. So, let's change
read_cfg() to return some integer value to its caller init(), and let's perform
the free routine at a caller level, as cfg_cfgfiles list was initialized and
initially filled at this level.
2024-08-05 04:03:52 -04:00
* the configuration contains fatal errors .
2024-06-26 11:03:04 -04:00
*/
if ( err_code & ( ERR_ABORT | ERR_FATAL ) ) {
ha_alert ( " Fatal errors found in configuration. \n " ) ;
MEDIUM: startup: make read_cfg() return immediately on ENOMEM
This commit prepares read_cfg() to call load_cfg_in_mem() helper in order to
load configuration files in memory. Before, read_cfg() calls the parser for all
files from cfg_cfgfiles list and cumulates parser's errors and memprintf's
errors in for_each loop. memprintf's errors did not stop this loop and were
accounted just after.
Now, as we plan to load configuration files in memory, we stop the loop, if
memprintf() fails, and we show appropraite error message with ha_alert. Then
process terminates. So not all cumulated syntax-related errors will be shown
before exit in this case and we has to stop, because we run out of memory.
If we can't open the current file or we fail to allocate a memory to store
some configuration line, the previous behaviour is kept, process emits
appropriate alert message and exits.
If parser returns some syntax-related error on the current file, the previous
behaviour is kept as well. We cumulate such errors for all parsed files and we
check them just after the loop. All syntax-related errors for all files is
shown then as before in ha_alert messages line by line during the startup.
Then process will exit with 1.
As now cfg_cfgfiles list contains many pointers to some memory areas with
configuration files content and this content could be big, it's better to
free the list explicitly, when parsing was finished. So, let's change
read_cfg() to return some integer value to its caller init(), and let's perform
the free routine at a caller level, as cfg_cfgfiles list was initialized and
initially filled at this level.
2024-08-05 04:03:52 -04:00
goto err ;
2024-06-26 11:03:04 -04:00
}
MEDIUM: startup: make read_cfg() return immediately on ENOMEM
This commit prepares read_cfg() to call load_cfg_in_mem() helper in order to
load configuration files in memory. Before, read_cfg() calls the parser for all
files from cfg_cfgfiles list and cumulates parser's errors and memprintf's
errors in for_each loop. memprintf's errors did not stop this loop and were
accounted just after.
Now, as we plan to load configuration files in memory, we stop the loop, if
memprintf() fails, and we show appropraite error message with ha_alert. Then
process terminates. So not all cumulated syntax-related errors will be shown
before exit in this case and we has to stop, because we run out of memory.
If we can't open the current file or we fail to allocate a memory to store
some configuration line, the previous behaviour is kept, process emits
appropriate alert message and exits.
If parser returns some syntax-related error on the current file, the previous
behaviour is kept as well. We cumulate such errors for all parsed files and we
check them just after the loop. All syntax-related errors for all files is
shown then as before in ha_alert messages line by line during the startup.
Then process will exit with 1.
As now cfg_cfgfiles list contains many pointers to some memory areas with
configuration files content and this content could be big, it's better to
free the list explicitly, when parsing was finished. So, let's change
read_cfg() to return some integer value to its caller init(), and let's perform
the free routine at a caller level, as cfg_cfgfiles list was initialized and
initially filled at this level.
2024-08-05 04:03:52 -04:00
2024-06-26 11:03:04 -04:00
setenv ( " HAPROXY_CFGFILES " , env_cfgfiles , 1 ) ;
free ( env_cfgfiles ) ;
return err_code ;
MEDIUM: startup: make read_cfg() return immediately on ENOMEM
This commit prepares read_cfg() to call load_cfg_in_mem() helper in order to
load configuration files in memory. Before, read_cfg() calls the parser for all
files from cfg_cfgfiles list and cumulates parser's errors and memprintf's
errors in for_each loop. memprintf's errors did not stop this loop and were
accounted just after.
Now, as we plan to load configuration files in memory, we stop the loop, if
memprintf() fails, and we show appropraite error message with ha_alert. Then
process terminates. So not all cumulated syntax-related errors will be shown
before exit in this case and we has to stop, because we run out of memory.
If we can't open the current file or we fail to allocate a memory to store
some configuration line, the previous behaviour is kept, process emits
appropriate alert message and exits.
If parser returns some syntax-related error on the current file, the previous
behaviour is kept as well. We cumulate such errors for all parsed files and we
check them just after the loop. All syntax-related errors for all files is
shown then as before in ha_alert messages line by line during the startup.
Then process will exit with 1.
As now cfg_cfgfiles list contains many pointers to some memory areas with
configuration files content and this content could be big, it's better to
free the list explicitly, when parsing was finished. So, let's change
read_cfg() to return some integer value to its caller init(), and let's perform
the free routine at a caller level, as cfg_cfgfiles list was initialized and
initially filled at this level.
2024-08-05 04:03:52 -04:00
err :
free ( env_cfgfiles ) ;
return - 1 ;
2024-06-26 11:03:04 -04:00
}
2017-06-01 11:38:51 -04:00
/*
* copy and cleanup the current argv
BUG/MEDIUM: mworker: fix the copy of options in copy_argv()
The copy_argv() function, which is used to copy and remove some of the
arguments of the command line in order to re-exec() the master process,
is poorly implemented.
The function tries to remove the -x and the -sf/-st options but without
taking into account that some of the options could take a parameter
starting with a dash.
In issue #644, haproxy starts with "-L -xfoo" which is perfectly
correct. However, the re-exec is done without "-xfoo" because the master
tries to remove the "-x" option. Indeed, the copy_argv() function does
not know how much arguments an option can have, and just assume that
everything starting with a dash is an option. So haproxy is exec() with
"-L" but without a parameter, which is wrong and leads to the exit of
the master, with usage().
To fix this issue, copy_argv() must know how much parameters an option
takes, and copy or skip the parameters correctly.
This fix is a first step but it should evolve to a cleaner way of
declaring the options to avoid deduplication of the parsing code, so we
avoid new bugs.
Should be backported with care as far as 1.8, by removing the options
that does not exists in the previous versions.
2020-06-04 11:40:23 -04:00
* Remove the - sf / - st / - x parameters
2017-06-01 11:38:51 -04:00
* Return an allocated copy of argv
*/
static char * * copy_argv ( int argc , char * * argv )
{
BUG/MEDIUM: mworker: fix the copy of options in copy_argv()
The copy_argv() function, which is used to copy and remove some of the
arguments of the command line in order to re-exec() the master process,
is poorly implemented.
The function tries to remove the -x and the -sf/-st options but without
taking into account that some of the options could take a parameter
starting with a dash.
In issue #644, haproxy starts with "-L -xfoo" which is perfectly
correct. However, the re-exec is done without "-xfoo" because the master
tries to remove the "-x" option. Indeed, the copy_argv() function does
not know how much arguments an option can have, and just assume that
everything starting with a dash is an option. So haproxy is exec() with
"-L" but without a parameter, which is wrong and leads to the exit of
the master, with usage().
To fix this issue, copy_argv() must know how much parameters an option
takes, and copy or skip the parameters correctly.
This fix is a first step but it should evolve to a cleaner way of
declaring the options to avoid deduplication of the parsing code, so we
avoid new bugs.
Should be backported with care as far as 1.8, by removing the options
that does not exists in the previous versions.
2020-06-04 11:40:23 -04:00
char * * newargv , * * retargv ;
2017-06-01 11:38:51 -04:00
2020-09-12 14:26:43 -04:00
newargv = calloc ( argc + 2 , sizeof ( * newargv ) ) ;
2017-06-01 11:38:51 -04:00
if ( newargv = = NULL ) {
2017-11-24 10:50:31 -05:00
ha_warning ( " Cannot allocate memory \n " ) ;
2017-06-01 11:38:51 -04:00
return NULL ;
}
BUG/MEDIUM: mworker: fix the copy of options in copy_argv()
The copy_argv() function, which is used to copy and remove some of the
arguments of the command line in order to re-exec() the master process,
is poorly implemented.
The function tries to remove the -x and the -sf/-st options but without
taking into account that some of the options could take a parameter
starting with a dash.
In issue #644, haproxy starts with "-L -xfoo" which is perfectly
correct. However, the re-exec is done without "-xfoo" because the master
tries to remove the "-x" option. Indeed, the copy_argv() function does
not know how much arguments an option can have, and just assume that
everything starting with a dash is an option. So haproxy is exec() with
"-L" but without a parameter, which is wrong and leads to the exit of
the master, with usage().
To fix this issue, copy_argv() must know how much parameters an option
takes, and copy or skip the parameters correctly.
This fix is a first step but it should evolve to a cleaner way of
declaring the options to avoid deduplication of the parsing code, so we
avoid new bugs.
Should be backported with care as far as 1.8, by removing the options
that does not exists in the previous versions.
2020-06-04 11:40:23 -04:00
retargv = newargv ;
2017-06-01 11:38:51 -04:00
BUG/MEDIUM: mworker: fix the copy of options in copy_argv()
The copy_argv() function, which is used to copy and remove some of the
arguments of the command line in order to re-exec() the master process,
is poorly implemented.
The function tries to remove the -x and the -sf/-st options but without
taking into account that some of the options could take a parameter
starting with a dash.
In issue #644, haproxy starts with "-L -xfoo" which is perfectly
correct. However, the re-exec is done without "-xfoo" because the master
tries to remove the "-x" option. Indeed, the copy_argv() function does
not know how much arguments an option can have, and just assume that
everything starting with a dash is an option. So haproxy is exec() with
"-L" but without a parameter, which is wrong and leads to the exit of
the master, with usage().
To fix this issue, copy_argv() must know how much parameters an option
takes, and copy or skip the parameters correctly.
This fix is a first step but it should evolve to a cleaner way of
declaring the options to avoid deduplication of the parsing code, so we
avoid new bugs.
Should be backported with care as far as 1.8, by removing the options
that does not exists in the previous versions.
2020-06-04 11:40:23 -04:00
/* first copy argv[0] */
* newargv + + = * argv + + ;
argc - - ;
while ( argc > 0 ) {
if ( * * argv ! = ' - ' ) {
/* non options are copied but will fail in the argument parser */
* newargv + + = * argv + + ;
argc - - ;
} else {
char * flag ;
flag = * argv + 1 ;
if ( flag [ 0 ] = = ' - ' & & flag [ 1 ] = = 0 ) {
/* "--\0" copy every arguments till the end of argv */
* newargv + + = * argv + + ;
argc - - ;
while ( argc > 0 ) {
* newargv + + = * argv + + ;
argc - - ;
}
} else {
switch ( * flag ) {
case ' s ' :
/* -sf / -st and their parameters are ignored */
if ( flag [ 1 ] = = ' f ' | | flag [ 1 ] = = ' t ' ) {
argc - - ;
argv + + ;
/* The list can't contain a negative value since the only
way to know the end of this list is by looking for the
next option or the end of the options */
while ( argc > 0 & & argv [ 0 ] [ 0 ] ! = ' - ' ) {
argc - - ;
argv + + ;
}
2020-09-02 10:12:23 -04:00
} else {
argc - - ;
argv + + ;
BUG/MEDIUM: mworker: fix the copy of options in copy_argv()
The copy_argv() function, which is used to copy and remove some of the
arguments of the command line in order to re-exec() the master process,
is poorly implemented.
The function tries to remove the -x and the -sf/-st options but without
taking into account that some of the options could take a parameter
starting with a dash.
In issue #644, haproxy starts with "-L -xfoo" which is perfectly
correct. However, the re-exec is done without "-xfoo" because the master
tries to remove the "-x" option. Indeed, the copy_argv() function does
not know how much arguments an option can have, and just assume that
everything starting with a dash is an option. So haproxy is exec() with
"-L" but without a parameter, which is wrong and leads to the exit of
the master, with usage().
To fix this issue, copy_argv() must know how much parameters an option
takes, and copy or skip the parameters correctly.
This fix is a first step but it should evolve to a cleaner way of
declaring the options to avoid deduplication of the parsing code, so we
avoid new bugs.
Should be backported with care as far as 1.8, by removing the options
that does not exists in the previous versions.
2020-06-04 11:40:23 -04:00
}
break ;
case ' x ' :
/* this option and its parameter are ignored */
argc - - ;
argv + + ;
if ( argc > 0 ) {
argc - - ;
argv + + ;
}
break ;
case ' C ' :
case ' n ' :
case ' m ' :
case ' N ' :
case ' L ' :
case ' f ' :
case ' p ' :
case ' S ' :
/* these options have only 1 parameter which must be copied and can start with a '-' */
* newargv + + = * argv + + ;
argc - - ;
if ( argc = = 0 )
goto error ;
* newargv + + = * argv + + ;
argc - - ;
break ;
default :
/* for other options just copy them without parameters, this is also done
* for options like " --foo " , but this will fail in the argument parser .
* */
* newargv + + = * argv + + ;
argc - - ;
break ;
}
2017-06-01 11:38:51 -04:00
}
}
}
2017-06-20 05:20:23 -04:00
BUG/MEDIUM: mworker: fix the copy of options in copy_argv()
The copy_argv() function, which is used to copy and remove some of the
arguments of the command line in order to re-exec() the master process,
is poorly implemented.
The function tries to remove the -x and the -sf/-st options but without
taking into account that some of the options could take a parameter
starting with a dash.
In issue #644, haproxy starts with "-L -xfoo" which is perfectly
correct. However, the re-exec is done without "-xfoo" because the master
tries to remove the "-x" option. Indeed, the copy_argv() function does
not know how much arguments an option can have, and just assume that
everything starting with a dash is an option. So haproxy is exec() with
"-L" but without a parameter, which is wrong and leads to the exit of
the master, with usage().
To fix this issue, copy_argv() must know how much parameters an option
takes, and copy or skip the parameters correctly.
This fix is a first step but it should evolve to a cleaner way of
declaring the options to avoid deduplication of the parsing code, so we
avoid new bugs.
Should be backported with care as far as 1.8, by removing the options
that does not exists in the previous versions.
2020-06-04 11:40:23 -04:00
return retargv ;
error :
free ( retargv ) ;
return NULL ;
2017-06-01 11:38:51 -04:00
}
BUG/MEDIUM: random: initialize the random pool a bit better
Since the UUID sample fetch was created, some people noticed that in
certain virtualized environments they manage to get exact same UUIDs
on different instances started exactly at the same moment. It turns
out that the randoms were only initialized to spread the health checks
originally, not to provide "clean" randoms.
This patch changes this and collects more randomness from various
sources, including existing randoms, /dev/urandom when available,
RAND_bytes() when OpenSSL is available, as well as the timing for such
operations, then applies a SHA1 on all this to keep a 160 bits random
seed available, 32 of which are passed to srandom().
It's worth mentioning that there's no clean way to pass more than 32
bits to srandom() as even initstate() provides an opaque state that
must absolutely not be tampered with since known implementations
contain state information.
At least this allows to have up to 4 billion different sequences
from the boot, which is not that bad.
Note that the thread safety was still not addressed, which is another
issue for another patch.
This must be backported to all versions containing the UUID sample
fetch function, i.e. as far as 2.0.
2020-03-06 12:57:15 -05:00
/* Performs basic random seed initialization. The main issue with this is that
* srandom_r ( ) only takes 32 bits and purposely provides a reproducible sequence ,
* which means that there will only be 4 billion possible random sequences once
* srandom ( ) is called , regardless of the internal state . Not calling it is
* even worse as we ' ll always produce the same randoms sequences . What we do
* here is to create an initial sequence from various entropy sources , hash it
* using SHA1 and keep the resulting 160 bits available globally .
*
* We initialize the current process with the first 32 bits before starting the
* polling loop , where all this will be changed to have process specific and
* thread specific sequences .
BUG/MEDIUM: random: implement a thread-safe and process-safe PRNG
This is the replacement of failed attempt to add thread safety and
per-process sequences of random numbers initally tried with commit
1c306aa84d ("BUG/MEDIUM: random: implement per-thread and per-process
random sequences").
This new version takes a completely different approach and doesn't try
to work around the horrible OS-specific and non-portable random API
anymore. Instead it implements "xoroshiro128**", a reputedly high
quality random number generator, which is one of the many variants of
xorshift, which passes all quality tests and which is described here:
http://prng.di.unimi.it/
While not cryptographically secure, it is fast and features a 2^128-1
period. It supports fast jumps allowing to cut the period into smaller
non-overlapping sequences, which we use here to support up to 2^32
processes each having their own, non-overlapping sequence of 2^96
numbers (~7*10^28). This is enough to provide 1 billion randoms per
second and per process for 2200 billion years.
The implementation was made thread-safe either by using a double 64-bit
CAS on platforms supporting it (x86_64, aarch64) or by using a local
lock for the time needed to perform the shift operations. This ensures
that all threads pick numbers from the same pool so that it is not
needed to assign per-thread ranges. For processes we use the fast jump
method to advance the sequence by 2^96 for each process.
Before this patch, the following config:
global
nbproc 8
frontend f
bind :4445
mode http
log stdout format raw daemon
log-format "%[uuid] %pid"
redirect location /
Would produce this output:
a4d0ad64-2645-4b74-b894-48acce0669af 12987
a4d0ad64-2645-4b74-b894-48acce0669af 12992
a4d0ad64-2645-4b74-b894-48acce0669af 12986
a4d0ad64-2645-4b74-b894-48acce0669af 12988
a4d0ad64-2645-4b74-b894-48acce0669af 12991
a4d0ad64-2645-4b74-b894-48acce0669af 12989
a4d0ad64-2645-4b74-b894-48acce0669af 12990
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12987
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12992
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12986
(...)
And now produces:
f94b29b3-da74-4e03-a0c5-a532c635bad9 13011
47470c02-4862-4c33-80e7-a952899570e5 13014
86332123-539a-47bf-853f-8c8ea8b2a2b5 13013
8f9efa99-3143-47b2-83cf-d618c8dea711 13012
3cc0f5c7-d790-496b-8d39-bec77647af5b 13015
3ec64915-8f95-4374-9e66-e777dc8791e0 13009
0f9bf894-dcde-408c-b094-6e0bb3255452 13011
49c7bfde-3ffb-40e9-9a8d-8084d650ed8f 13014
e23f6f2e-35c5-4433-a294-b790ab902653 13012
There are multiple benefits to using this method. First, it doesn't
depend anymore on a non-portable API. Second it's thread safe. Third it
is fast and more proven than any hack we could attempt to try to work
around the deficiencies of the various implementations around.
This commit depends on previous patches "MINOR: tools: add 64-bit rotate
operators" and "BUG/MEDIUM: random: initialize the random pool a bit
better", all of which will need to be backported at least as far as
version 2.0. It doesn't require to backport the build fixes for circular
include files dependecy anymore.
2020-03-07 18:42:37 -05:00
*
* Before starting threads , it ' s still possible to call random ( ) as srandom ( )
* is initialized from this , but after threads and / or processes are started ,
* only ha_random ( ) is expected to be used to guarantee distinct sequences .
BUG/MEDIUM: random: initialize the random pool a bit better
Since the UUID sample fetch was created, some people noticed that in
certain virtualized environments they manage to get exact same UUIDs
on different instances started exactly at the same moment. It turns
out that the randoms were only initialized to spread the health checks
originally, not to provide "clean" randoms.
This patch changes this and collects more randomness from various
sources, including existing randoms, /dev/urandom when available,
RAND_bytes() when OpenSSL is available, as well as the timing for such
operations, then applies a SHA1 on all this to keep a 160 bits random
seed available, 32 of which are passed to srandom().
It's worth mentioning that there's no clean way to pass more than 32
bits to srandom() as even initstate() provides an opaque state that
must absolutely not be tampered with since known implementations
contain state information.
At least this allows to have up to 4 billion different sequences
from the boot, which is not that bad.
Note that the thread safety was still not addressed, which is another
issue for another patch.
This must be backported to all versions containing the UUID sample
fetch function, i.e. as far as 2.0.
2020-03-06 12:57:15 -05:00
*/
static void ha_random_boot ( char * const * argv )
{
unsigned char message [ 256 ] ;
unsigned char * m = message ;
struct timeval tv ;
blk_SHA_CTX ctx ;
unsigned long l ;
int fd ;
int i ;
/* start with current time as pseudo-random seed */
gettimeofday ( & tv , NULL ) ;
write_u32 ( m , tv . tv_sec ) ; m + = 4 ;
write_u32 ( m , tv . tv_usec ) ; m + = 4 ;
/* PID and PPID add some OS-based randomness */
write_u16 ( m , getpid ( ) ) ; m + = 2 ;
write_u16 ( m , getppid ( ) ) ; m + = 2 ;
/* take up to 160 bits bytes from /dev/urandom if available (non-blocking) */
fd = open ( " /dev/urandom " , O_RDONLY ) ;
if ( fd > = 0 ) {
i = read ( fd , m , 20 ) ;
if ( i > 0 )
m + = i ;
close ( fd ) ;
}
/* take up to 160 bits bytes from openssl (non-blocking) */
# ifdef USE_OPENSSL
if ( RAND_bytes ( m , 20 ) = = 1 )
m + = 20 ;
# endif
/* take 160 bits from existing random in case it was already initialized */
for ( i = 0 ; i < 5 ; i + + ) {
write_u32 ( m , random ( ) ) ;
m + = 4 ;
}
/* stack address (benefit form operating system's ASLR) */
l = ( unsigned long ) & m ;
memcpy ( m , & l , sizeof ( l ) ) ; m + = sizeof ( l ) ;
/* argv address (benefit form operating system's ASLR) */
l = ( unsigned long ) & argv ;
memcpy ( m , & l , sizeof ( l ) ) ; m + = sizeof ( l ) ;
/* use tv_usec again after all the operations above */
gettimeofday ( & tv , NULL ) ;
write_u32 ( m , tv . tv_usec ) ; m + = 4 ;
/*
* At this point , ~ 84 - 92 bytes have been used
*/
/* finish with the hostname */
strncpy ( ( char * ) m , hostname , message + sizeof ( message ) - m ) ;
m + = strlen ( hostname ) ;
/* total message length */
l = m - message ;
memset ( & ctx , 0 , sizeof ( ctx ) ) ;
blk_SHA1_Init ( & ctx ) ;
blk_SHA1_Update ( & ctx , message , l ) ;
blk_SHA1_Final ( boot_seed , & ctx ) ;
srandom ( read_u32 ( boot_seed ) ) ;
BUG/MEDIUM: random: implement a thread-safe and process-safe PRNG
This is the replacement of failed attempt to add thread safety and
per-process sequences of random numbers initally tried with commit
1c306aa84d ("BUG/MEDIUM: random: implement per-thread and per-process
random sequences").
This new version takes a completely different approach and doesn't try
to work around the horrible OS-specific and non-portable random API
anymore. Instead it implements "xoroshiro128**", a reputedly high
quality random number generator, which is one of the many variants of
xorshift, which passes all quality tests and which is described here:
http://prng.di.unimi.it/
While not cryptographically secure, it is fast and features a 2^128-1
period. It supports fast jumps allowing to cut the period into smaller
non-overlapping sequences, which we use here to support up to 2^32
processes each having their own, non-overlapping sequence of 2^96
numbers (~7*10^28). This is enough to provide 1 billion randoms per
second and per process for 2200 billion years.
The implementation was made thread-safe either by using a double 64-bit
CAS on platforms supporting it (x86_64, aarch64) or by using a local
lock for the time needed to perform the shift operations. This ensures
that all threads pick numbers from the same pool so that it is not
needed to assign per-thread ranges. For processes we use the fast jump
method to advance the sequence by 2^96 for each process.
Before this patch, the following config:
global
nbproc 8
frontend f
bind :4445
mode http
log stdout format raw daemon
log-format "%[uuid] %pid"
redirect location /
Would produce this output:
a4d0ad64-2645-4b74-b894-48acce0669af 12987
a4d0ad64-2645-4b74-b894-48acce0669af 12992
a4d0ad64-2645-4b74-b894-48acce0669af 12986
a4d0ad64-2645-4b74-b894-48acce0669af 12988
a4d0ad64-2645-4b74-b894-48acce0669af 12991
a4d0ad64-2645-4b74-b894-48acce0669af 12989
a4d0ad64-2645-4b74-b894-48acce0669af 12990
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12987
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12992
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12986
(...)
And now produces:
f94b29b3-da74-4e03-a0c5-a532c635bad9 13011
47470c02-4862-4c33-80e7-a952899570e5 13014
86332123-539a-47bf-853f-8c8ea8b2a2b5 13013
8f9efa99-3143-47b2-83cf-d618c8dea711 13012
3cc0f5c7-d790-496b-8d39-bec77647af5b 13015
3ec64915-8f95-4374-9e66-e777dc8791e0 13009
0f9bf894-dcde-408c-b094-6e0bb3255452 13011
49c7bfde-3ffb-40e9-9a8d-8084d650ed8f 13014
e23f6f2e-35c5-4433-a294-b790ab902653 13012
There are multiple benefits to using this method. First, it doesn't
depend anymore on a non-portable API. Second it's thread safe. Third it
is fast and more proven than any hack we could attempt to try to work
around the deficiencies of the various implementations around.
This commit depends on previous patches "MINOR: tools: add 64-bit rotate
operators" and "BUG/MEDIUM: random: initialize the random pool a bit
better", all of which will need to be backported at least as far as
version 2.0. It doesn't require to backport the build fixes for circular
include files dependecy anymore.
2020-03-07 18:42:37 -05:00
ha_random_seed ( boot_seed , sizeof ( boot_seed ) ) ;
BUG/MEDIUM: random: initialize the random pool a bit better
Since the UUID sample fetch was created, some people noticed that in
certain virtualized environments they manage to get exact same UUIDs
on different instances started exactly at the same moment. It turns
out that the randoms were only initialized to spread the health checks
originally, not to provide "clean" randoms.
This patch changes this and collects more randomness from various
sources, including existing randoms, /dev/urandom when available,
RAND_bytes() when OpenSSL is available, as well as the timing for such
operations, then applies a SHA1 on all this to keep a 160 bits random
seed available, 32 of which are passed to srandom().
It's worth mentioning that there's no clean way to pass more than 32
bits to srandom() as even initstate() provides an opaque state that
must absolutely not be tampered with since known implementations
contain state information.
At least this allows to have up to 4 billion different sequences
from the boot, which is not that bad.
Note that the thread safety was still not addressed, which is another
issue for another patch.
This must be backported to all versions containing the UUID sample
fetch function, i.e. as far as 2.0.
2020-03-06 12:57:15 -05:00
}
MEDIUM: init: always try to push the FD limit when maxconn is set from -m
When a maximum memory setting is passed to haproxy and maxconn is not set
and ulimit-n is not set, it is expected that maxconn will be set to the
highest value permitted by this memory setting, possibly affecting the
FD limit.
When maxconn was changed to be deduced from the current process's FD limit,
the automatic setting above was partially lost because it now remains
limited to the current FD limit in addition to being limited to the
memory usage. For unprivileged processes it does not change anything,
but for privileged processes the difference is important. Indeed, the
previous behavior ensured that the new FD limit could be enforced on
the process as long as the user had the privilege to do so. Now this
does not happen anymore, and some people rely on this for automatic
sizing in VM environments.
This patch implements the ability to verify if the setting will be
enforceable on the process or not. First it computes maxconn based on
the memory limits alone, then checks if the process is willing to accept
them, otherwise tries again by respecting the process' hard limit.
Thanks to this we now have the best of the pre-2.0 behavior and the
current one, in that privileged users will be able to get as high a
maxconn as they need just based on the memory limit, while unprivileged
users will still get as high a setting as permitted by the intersection
of the memory limit and the process' FD limit.
Ideally, after some observation period, this patch along with the
previous one "MINOR: init: move the maxsock calculation code to
compute_ideal_maxsock()" should be backported to 2.1 and 2.0.
Thanks to Baptiste for raising the issue.
2020-03-10 12:54:54 -04:00
2024-06-26 12:39:45 -04:00
/* Evaluates a condition provided within a conditional block of the
* configuration . Makes process to exit with 0 , if the condition is true , with
* 1 , if the condition is false or with 2 , if parse_line encounters an error .
*/
static void do_check_condition ( char * progname )
{
int result ;
uint32_t err ;
const char * errptr ;
char * errmsg = NULL ;
char * args [ MAX_LINE_ARGS + 1 ] ;
int arg = sizeof ( args ) / sizeof ( * args ) ;
size_t outlen ;
char * w ;
if ( ! check_condition )
usage ( progname ) ;
outlen = strlen ( check_condition ) + 1 ;
err = parse_line ( check_condition , check_condition , & outlen , args , & arg ,
PARSE_OPT_ENV | PARSE_OPT_WORD_EXPAND | PARSE_OPT_DQUOTE | PARSE_OPT_SQUOTE | PARSE_OPT_BKSLASH ,
& errptr ) ;
if ( err & PARSE_ERR_QUOTE ) {
ha_alert ( " Syntax Error in condition: Unmatched quote. \n " ) ;
exit ( 2 ) ;
}
if ( err & PARSE_ERR_HEX ) {
ha_alert ( " Syntax Error in condition: Truncated or invalid hexadecimal sequence. \n " ) ;
exit ( 2 ) ;
}
if ( err & ( PARSE_ERR_TOOLARGE | PARSE_ERR_OVERLAP ) ) {
ha_alert ( " Error in condition: Line too long. \n " ) ;
exit ( 2 ) ;
}
if ( err & PARSE_ERR_TOOMANY ) {
ha_alert ( " Error in condition: Too many words. \n " ) ;
exit ( 2 ) ;
}
if ( err ) {
ha_alert ( " Unhandled error in condition, please report this to the developers. \n " ) ;
exit ( 2 ) ;
}
/* remerge all words into a single expression */
for ( w = * args ; ( w + = strlen ( w ) ) < check_condition + outlen - 1 ; * w = ' ' )
;
result = cfg_eval_condition ( args , & errmsg , & errptr ) ;
if ( result < 0 ) {
if ( errmsg )
ha_alert ( " Failed to evaluate condition: %s \n " , errmsg ) ;
exit ( 2 ) ;
}
exit ( result ? 0 : 1 ) ;
}
2022-02-17 11:45:58 -05:00
/* This performs th every basic early initialization at the end of the PREPARE
* init stage . It may only assume that list heads are initialized , but not that
* anything else is correct . It will initialize a number of variables that
* depend on command line and will pre - parse the command line . If it fails , it
* directly exits .
2006-06-25 20:48:02 -04:00
*/
2022-02-17 11:45:58 -05:00
static void init_early ( int argc , char * * argv )
2006-06-25 20:48:02 -04:00
{
2010-12-22 11:08:21 -05:00
char * progname ;
2022-02-17 11:45:58 -05:00
char * tmp ;
int len ;
2006-06-25 20:48:02 -04:00
2023-02-21 08:07:05 -05:00
setenv ( " HAPROXY_STARTUP_VERSION " , HAPROXY_VERSION , 0 ) ;
2022-02-17 11:45:58 -05:00
/* First, let's initialize most global variables */
totalconn = actconn = listeners = stopping = 0 ;
killed = pid = 0 ;
2024-05-29 05:27:21 -04:00
/* cast to one byte in order to fill better a 3 bytes hole in the global struct,
* we hopefully will never start with > than 255 args
*/
global . argc = ( unsigned char ) argc ;
global . argv = argv ;
2022-02-17 11:45:58 -05:00
global . maxsock = 10 ; /* reserve 10 fds ; will be incremented by socket eaters */
global . rlimit_memmax_all = HAPROXY_MEMMAX ;
2017-10-24 07:53:54 -04:00
global . mode = MODE_STARTING ;
2017-06-01 11:38:51 -04:00
2022-02-17 11:45:58 -05:00
/* if we were in mworker mode, we should restart in mworker mode */
if ( getenv ( " HAPROXY_MWORKER_REEXEC " ) ! = NULL )
global . mode | = MODE_MWORKER ;
2012-05-16 08:16:48 -04:00
2022-02-17 11:45:58 -05:00
/* initialize date, time, and pid */
tzset ( ) ;
clock_init_process_date ( ) ;
2023-02-07 09:52:14 -05:00
start_date = date ;
2023-04-28 08:50:29 -04:00
start_time_ns = now_ns ;
2022-02-17 11:45:58 -05:00
pid = getpid ( ) ;
/* Set local host name and adjust some environment variables.
* NB : POSIX does not make it mandatory for gethostname ( ) to
* NULL - terminate the string in case of truncation , and at least
* FreeBSD appears not to do it .
2010-09-23 12:30:22 -04:00
*/
memset ( hostname , 0 , sizeof ( hostname ) ) ;
gethostname ( hostname , sizeof ( hostname ) - 1 ) ;
2020-06-18 10:56:47 -04:00
2022-02-17 11:45:58 -05:00
/* preset some environment variables */
localpeer = strdup ( hostname ) ;
if ( ! localpeer | | setenv ( " HAPROXY_LOCALPEER " , localpeer , 1 ) < 0 ) {
2020-06-18 10:56:47 -04:00
ha_alert ( " Cannot allocate memory for local peer. \n " ) ;
exit ( EXIT_FAILURE ) ;
}
2010-09-23 12:30:22 -04:00
2022-02-17 11:45:58 -05:00
/* extract the program name from argv[0], it will be used for the logs
* and error messages .
*/
progname = * argv ;
while ( ( tmp = strchr ( progname , ' / ' ) ) ! = NULL )
progname = tmp + 1 ;
2006-06-25 20:48:02 -04:00
2022-02-17 11:45:58 -05:00
len = strlen ( progname ) ;
progname = strdup ( progname ) ;
if ( ! progname ) {
ha_alert ( " Cannot allocate memory for log_tag. \n " ) ;
exit ( EXIT_FAILURE ) ;
}
2014-02-14 05:59:04 -05:00
2022-02-17 11:45:58 -05:00
chunk_initlen ( & global . log_tag , progname , len , len ) ;
}
2018-11-26 10:31:20 -05:00
2022-02-17 12:10:36 -05:00
/* handles program arguments. Very minimal parsing is performed, variables are
* fed with some values , and lists are completed with other ones . In case of
* error , it will exit .
2022-02-17 11:45:58 -05:00
*/
2022-02-17 12:10:36 -05:00
static void init_args ( int argc , char * * argv )
2022-02-17 11:45:58 -05:00
{
char * progname = global . log_tag . area ;
2022-02-17 12:10:36 -05:00
char * err_msg = NULL ;
2015-01-23 08:06:13 -05:00
2022-02-17 11:45:58 -05:00
/* pre-fill in the global tuning options before we let the cmdline
* change them .
*/
2009-01-25 09:42:27 -05:00
global . tune . options | = GTUNE_USE_SELECT ; /* select() is always available */
2019-05-22 13:24:06 -04:00
# if defined(USE_POLL)
2009-01-25 09:42:27 -05:00
global . tune . options | = GTUNE_USE_POLL ;
2006-06-25 20:48:02 -04:00
# endif
2019-05-22 13:24:06 -04:00
# if defined(USE_EPOLL)
2009-01-25 09:42:27 -05:00
global . tune . options | = GTUNE_USE_EPOLL ;
2006-06-25 20:48:02 -04:00
# endif
2019-05-22 13:24:06 -04:00
# if defined(USE_KQUEUE)
2009-01-25 09:42:27 -05:00
global . tune . options | = GTUNE_USE_KQUEUE ;
2007-04-09 06:03:06 -04:00
# endif
2019-05-22 13:24:06 -04:00
# if defined(USE_EVPORTS)
2019-04-08 12:53:32 -04:00
global . tune . options | = GTUNE_USE_EVPORTS ;
# endif
2019-05-22 13:24:06 -04:00
# if defined(USE_LINUX_SPLICE)
2009-01-25 10:03:28 -05:00
global . tune . options | = GTUNE_USE_SPLICE ;
# endif
2014-04-14 09:56:58 -04:00
# if defined(USE_GETADDRINFO)
global . tune . options | = GTUNE_USE_GAI ;
# endif
2020-07-01 12:49:24 -04:00
# ifdef USE_THREAD
global . tune . options | = GTUNE_IDLE_POOL_SHARED ;
2022-11-21 05:54:13 -05:00
# endif
# ifdef USE_QUIC
global . tune . options | = GTUNE_QUIC_SOCK_PER_CONN ;
2020-07-01 12:49:24 -04:00
# endif
2020-03-28 14:29:58 -04:00
global . tune . options | = GTUNE_STRICT_LIMITS ;
2006-06-25 20:48:02 -04:00
2023-02-20 08:06:52 -05:00
global . tune . options | = GTUNE_USE_FAST_FWD ; /* Use fast-forward by default */
MINOR: global: Use a dedicated bitfield to customize zero-copy fast-forwarding
Zero-copy fast-forwading feature is a quite new and is a bit sensitive.
There is an option to disable it globally. However, all protocols have not
the same maturity. For instance, for the PT multiplexer, there is nothing
really new. The zero-copy fast-forwading is only another name for the kernel
splicing. However, for the QUIC/H3, it is pretty new, not really optimized
and it will evolved. And soon, the support will be added for the cache
applet.
In this context, it is usefull to be able to enable/disable zero-copy
fast-forwading per-protocol and applet. And when it is applicable, on sends
or receives separately. So, instead of having one flag to disable it
globally, there is now a dedicated bitfield, global.tune.no_zero_copy_fwd.
2023-12-04 08:18:50 -05:00
/* Use zero-copy forwarding by default */
2023-12-22 10:26:07 -05:00
global . tune . no_zero_copy_fwd = 0 ;
2023-02-20 08:06:52 -05:00
2022-02-17 12:10:36 -05:00
/* keep a copy of original arguments for the master process */
old_argv = copy_argv ( argc , argv ) ;
if ( ! old_argv ) {
ha_alert ( " failed to copy argv. \n " ) ;
exit ( EXIT_FAILURE ) ;
}
/* skip program name and start */
2006-06-25 20:48:02 -04:00
argc - - ; argv + + ;
while ( argc > 0 ) {
char * flag ;
if ( * * argv = = ' - ' ) {
flag = * argv + 1 ;
/* 1 arg */
if ( * flag = = ' v ' ) {
display_version ( ) ;
2007-12-02 05:28:59 -05:00
if ( flag [ 1 ] = = ' v ' ) /* -vv */
display_build_opts ( ) ;
2022-04-26 18:08:11 -04:00
deinit_and_exit ( 0 ) ;
2006-06-25 20:48:02 -04:00
}
2019-05-22 13:24:06 -04:00
# if defined(USE_EPOLL)
2006-06-25 20:48:02 -04:00
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' e ' )
2009-01-25 09:42:27 -05:00
global . tune . options & = ~ GTUNE_USE_EPOLL ;
2006-06-25 20:48:02 -04:00
# endif
2019-05-22 13:24:06 -04:00
# if defined(USE_POLL)
2006-06-25 20:48:02 -04:00
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' p ' )
2009-01-25 09:42:27 -05:00
global . tune . options & = ~ GTUNE_USE_POLL ;
2007-04-09 06:03:06 -04:00
# endif
2019-05-22 13:24:06 -04:00
# if defined(USE_KQUEUE)
2007-04-09 06:03:06 -04:00
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' k ' )
2009-01-25 09:42:27 -05:00
global . tune . options & = ~ GTUNE_USE_KQUEUE ;
2009-01-25 10:03:28 -05:00
# endif
2019-05-22 13:24:06 -04:00
# if defined(USE_EVPORTS)
2019-04-08 12:53:32 -04:00
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' v ' )
global . tune . options & = ~ GTUNE_USE_EVPORTS ;
# endif
2019-05-22 13:24:06 -04:00
# if defined(USE_LINUX_SPLICE)
2009-01-25 10:03:28 -05:00
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' S ' )
global . tune . options & = ~ GTUNE_USE_SPLICE ;
2014-04-14 09:56:58 -04:00
# endif
# if defined(USE_GETADDRINFO)
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' G ' )
global . tune . options & = ~ GTUNE_USE_GAI ;
2016-09-12 17:42:20 -04:00
# endif
# if defined(SO_REUSEPORT)
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' R ' )
2023-04-22 09:09:07 -04:00
protocol_clrf_all ( PROTO_F_REUSEPORT_SUPPORTED ) ;
2006-06-25 20:48:02 -04:00
# endif
2023-02-14 10:12:54 -05:00
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' F ' )
2023-02-20 08:06:52 -05:00
global . tune . options & = ~ GTUNE_USE_FAST_FWD ;
2024-03-13 06:08:50 -04:00
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' I ' )
global . tune . options | = GTUNE_INSECURE_FORK ;
2014-01-29 06:24:34 -05:00
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' V ' )
global . ssl_server_verify = SSL_SERVER_VERIFY_NONE ;
2023-10-16 12:28:59 -04:00
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' Z ' )
MINOR: global: Use a dedicated bitfield to customize zero-copy fast-forwarding
Zero-copy fast-forwading feature is a quite new and is a bit sensitive.
There is an option to disable it globally. However, all protocols have not
the same maturity. For instance, for the PT multiplexer, there is nothing
really new. The zero-copy fast-forwading is only another name for the kernel
splicing. However, for the QUIC/H3, it is pretty new, not really optimized
and it will evolved. And soon, the support will be added for the cache
applet.
In this context, it is usefull to be able to enable/disable zero-copy
fast-forwading per-protocol and applet. And when it is applicable, on sends
or receives separately. So, instead of having one flag to disable it
globally, there is now a dedicated bitfield, global.tune.no_zero_copy_fwd.
2023-12-04 08:18:50 -05:00
global . tune . no_zero_copy_fwd | = NO_ZERO_COPY_FWD ;
2006-06-25 20:48:02 -04:00
else if ( * flag = = ' V ' )
arg_mode | = MODE_VERBOSE ;
2022-09-14 11:51:55 -04:00
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' C ' ) {
2022-09-29 04:34:04 -04:00
char * end ;
char * key ;
key = flag + 2 ;
for ( ; key & & * key ; key = end ) {
end = strchr ( key , ' , ' ) ;
if ( end )
* ( end + + ) = 0 ;
if ( strcmp ( key , " line " ) = = 0 )
arg_mode | = MODE_DUMP_NB_L ;
}
2022-09-14 11:51:55 -04:00
arg_mode | = MODE_DUMP_CFG ;
HA_ATOMIC_STORE ( & global . anon_key , atoll ( flag + 2 ) ) ;
}
2006-06-25 20:48:02 -04:00
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' b ' )
arg_mode | = MODE_FOREGROUND ;
2021-03-29 04:29:07 -04:00
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' D ' )
arg_mode | = MODE_DIAG ;
2020-04-15 10:42:39 -04:00
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' W ' )
arg_mode | = MODE_ZERO_WARNING ;
2022-02-23 08:15:18 -05:00
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' M ' ) {
2022-02-18 12:54:40 -05:00
int ret = pool_parse_debugging ( flag + 2 , & err_msg ) ;
if ( ret < = - 1 ) {
if ( ret < - 1 )
ha_alert ( " -dM: %s \n " , err_msg ) ;
else
printf ( " %s \n " , err_msg ) ;
ha_free ( & err_msg ) ;
exit ( ret < - 1 ? EXIT_FAILURE : 0 ) ;
} else if ( ret = = 0 ) {
ha_warning ( " -dM: %s \n " , err_msg ) ;
ha_free ( & err_msg ) ;
}
2022-02-23 08:15:18 -05:00
}
2016-11-07 15:03:16 -05:00
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' r ' )
global . tune . options | = GTUNE_RESOLVE_DONTFAIL ;
2021-12-28 09:43:11 -05:00
# if defined(HA_HAVE_DUMP_LIBS)
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' L ' )
arg_mode | = MODE_DUMP_LIBS ;
# endif
MINOR: management: add some basic keyword dump infrastructure
It's difficult from outside haproxy to detect the supported keywords
and syntax. Interestingly, many of our modern keywords are enumerated
since they're registered from constructors, so it's not very hard to
enumerate most of them.
This patch creates some basic infrastructure to support dumping existing
keywords from different classes on stdout. The format will differ depending
on the classes, but the idea is that the output could easily be passed to
a script that generates some simple syntax highlighting rules, completion
rules for editors, syntax checkers or config parsers.
The principle chosen here is that if "-dK" is passed on the command-line,
at the end of the parsing the registered keywords will be dumped for the
requested classes passed after "-dK". Special name "help" will show known
classes, while "all" will execute all of them. The reason for doing that
after the end of the config processor is that it will also enumerate
internally-generated keywords, Lua or even those loaded from external
code (e.g. if an add-on is loaded using LD_PRELOAD). A typical way to
call this with a valid config would be:
./haproxy -dKall -q -c -f /path/to/config
If there's no config available, feeding /dev/null will also do the job,
though it will not be able to detect dynamically created keywords, of
course.
This patch also updates the management doc.
For now nothing but the help is listed, various subsystems will follow
in subsequent patches.
2022-03-08 10:01:40 -05:00
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' K ' ) {
arg_mode | = MODE_DUMP_KWD ;
kwd_dump = flag + 2 ;
}
2023-11-22 08:58:59 -05:00
else if ( * flag = = ' d ' & & flag [ 1 ] = = ' t ' ) {
2023-11-22 11:27:57 -05:00
if ( argc > 1 & & argv [ 1 ] [ 0 ] ! = ' - ' ) {
if ( trace_parse_cmd ( argv [ 1 ] , & err_msg ) ) {
ha_alert ( " -dt: %s. \n " , err_msg ) ;
ha_free ( & err_msg ) ;
exit ( EXIT_FAILURE ) ;
}
argc - - ; argv + + ;
}
else {
trace_parse_cmd ( NULL , NULL ) ;
}
2023-11-22 08:58:59 -05:00
}
2006-06-25 20:48:02 -04:00
else if ( * flag = = ' d ' )
arg_mode | = MODE_DEBUG ;
2021-06-05 18:50:22 -04:00
else if ( * flag = = ' c ' & & flag [ 1 ] = = ' c ' ) {
arg_mode | = MODE_CHECK_CONDITION ;
argv + + ;
argc - - ;
check_condition = * argv ;
}
2006-06-25 20:48:02 -04:00
else if ( * flag = = ' c ' )
arg_mode | = MODE_CHECK ;
2017-06-01 11:38:50 -04:00
else if ( * flag = = ' D ' )
2009-05-18 10:29:51 -04:00
arg_mode | = MODE_DAEMON ;
2017-11-20 09:58:35 -05:00
else if ( * flag = = ' W ' & & flag [ 1 ] = = ' s ' ) {
2017-11-21 06:39:34 -05:00
arg_mode | = MODE_MWORKER | MODE_FOREGROUND ;
2017-11-20 09:58:35 -05:00
# if defined(USE_SYSTEMD)
global . tune . options | = GTUNE_USE_SYSTEMD ;
# else
2017-11-24 10:50:31 -05:00
ha_alert ( " master-worker mode with systemd support (-Ws) requested, but not compiled. Use master-worker mode (-W) if you are not using Type=notify in your unit file or recompile with USE_SYSTEMD=1. \n \n " ) ;
2017-11-20 09:58:35 -05:00
usage ( progname ) ;
# endif
}
2017-06-01 11:38:50 -04:00
else if ( * flag = = ' W ' )
arg_mode | = MODE_MWORKER ;
2006-06-25 20:48:02 -04:00
else if ( * flag = = ' q ' )
arg_mode | = MODE_QUIET ;
2017-04-05 16:33:04 -04:00
else if ( * flag = = ' x ' ) {
2020-06-04 17:41:29 -04:00
if ( argc < = 1 ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " Unix socket path expected with the -x flag \n \n " ) ;
2017-06-19 09:57:55 -04:00
usage ( progname ) ;
2017-04-05 16:33:04 -04:00
}
2017-06-19 10:37:19 -04:00
if ( old_unixsocket )
2017-11-24 10:50:31 -05:00
ha_warning ( " -x option already set, overwriting the value \n " ) ;
2017-04-05 16:33:04 -04:00
old_unixsocket = argv [ 1 ] ;
2017-06-19 10:37:19 -04:00
2017-04-05 16:33:04 -04:00
argv + + ;
argc - - ;
}
2018-10-26 08:47:36 -04:00
else if ( * flag = = ' S ' ) {
struct wordlist * c ;
2020-06-04 17:49:20 -04:00
if ( argc < = 1 ) {
2018-10-26 08:47:36 -04:00
ha_alert ( " Socket and optional bind parameters expected with the -S flag \n " ) ;
usage ( progname ) ;
}
if ( ( c = malloc ( sizeof ( * c ) ) ) = = NULL | | ( c - > s = strdup ( argv [ 1 ] ) ) = = NULL ) {
ha_alert ( " Cannot allocate memory \n " ) ;
exit ( EXIT_FAILURE ) ;
}
2021-04-21 01:32:39 -04:00
LIST_INSERT ( & mworker_cli_conf , & c - > list ) ;
2018-10-26 08:47:36 -04:00
argv + + ;
argc - - ;
}
2006-06-25 20:48:02 -04:00
else if ( * flag = = ' s ' & & ( flag [ 1 ] = = ' f ' | | flag [ 1 ] = = ' t ' ) ) {
/* list of pids to finish ('f') or terminate ('t') */
if ( flag [ 1 ] = = ' f ' )
oldpids_sig = SIGUSR1 ; /* finish then exit */
else
oldpids_sig = SIGTERM ; /* terminate immediately */
2015-10-08 05:32:32 -04:00
while ( argc > 1 & & argv [ 1 ] [ 0 ] ! = ' - ' ) {
2018-02-05 18:15:44 -05:00
char * endptr = NULL ;
2015-10-08 05:32:32 -04:00
oldpids = realloc ( oldpids , ( nb_oldpids + 1 ) * sizeof ( int ) ) ;
if ( ! oldpids ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " Cannot allocate old pid : out of memory. \n " ) ;
2015-10-08 05:32:32 -04:00
exit ( 1 ) ;
2006-06-25 20:48:02 -04:00
}
2015-10-08 05:32:32 -04:00
argc - - ; argv + + ;
2018-02-05 18:15:44 -05:00
errno = 0 ;
oldpids [ nb_oldpids ] = strtol ( * argv , & endptr , 10 ) ;
if ( errno ) {
ha_alert ( " -%2s option: failed to parse {%s}: %s \n " ,
flag ,
* argv , strerror ( errno ) ) ;
exit ( 1 ) ;
} else if ( endptr & & strlen ( endptr ) ) {
2020-02-25 02:16:33 -05:00
while ( isspace ( ( unsigned char ) * endptr ) ) endptr + + ;
2018-02-17 14:53:11 -05:00
if ( * endptr ! = 0 ) {
2018-02-05 18:15:44 -05:00
ha_alert ( " -%2s option: some bytes unconsumed in PID list {%s} \n " ,
flag , endptr ) ;
exit ( 1 ) ;
2018-02-17 14:53:11 -05:00
}
2018-02-05 18:15:44 -05:00
}
2015-10-08 05:32:32 -04:00
if ( oldpids [ nb_oldpids ] < = 0 )
usage ( progname ) ;
nb_oldpids + + ;
2006-06-25 20:48:02 -04:00
}
}
2015-10-08 05:58:48 -04:00
else if ( flag [ 0 ] = = ' - ' & & flag [ 1 ] = = 0 ) { /* "--" */
/* now that's a cfgfile list */
argv + + ; argc - - ;
while ( argc > 0 ) {
2024-08-07 12:20:43 -04:00
if ( ! list_append_cfgfile ( & cfg_cfgfiles , * argv , & err_msg ) ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " Cannot load configuration file/directory %s : %s \n " ,
* argv ,
err_msg ) ;
2015-10-08 05:58:48 -04:00
exit ( 1 ) ;
}
argv + + ; argc - - ;
}
break ;
}
2006-06-25 20:48:02 -04:00
else { /* >=2 args */
argv + + ; argc - - ;
if ( argc = = 0 )
2011-09-10 13:20:23 -04:00
usage ( progname ) ;
2006-06-25 20:48:02 -04:00
switch ( * flag ) {
2011-09-10 13:26:56 -04:00
case ' C ' : change_dir = * argv ; break ;
2006-06-25 20:48:02 -04:00
case ' n ' : cfg_maxconn = atol ( * argv ) ; break ;
2015-12-14 06:46:07 -05:00
case ' m ' : global . rlimit_memmax_all = atol ( * argv ) ; break ;
2006-06-25 20:48:02 -04:00
case ' N ' : cfg_maxpconn = atol ( * argv ) ; break ;
2018-04-17 10:46:13 -04:00
case ' L ' :
2020-06-18 10:56:47 -04:00
free ( localpeer ) ;
if ( ( localpeer = strdup ( * argv ) ) = = NULL ) {
ha_alert ( " Cannot allocate memory for local peer. \n " ) ;
exit ( EXIT_FAILURE ) ;
}
2018-04-17 10:46:13 -04:00
setenv ( " HAPROXY_LOCALPEER " , localpeer , 1 ) ;
2020-06-18 12:24:05 -04:00
global . localpeer_cmdline = 1 ;
2018-04-17 10:46:13 -04:00
break ;
2009-06-22 10:02:30 -04:00
case ' f ' :
2024-08-07 12:20:43 -04:00
if ( ! list_append_cfgfile ( & cfg_cfgfiles , * argv , & err_msg ) ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " Cannot load configuration file/directory %s : %s \n " ,
* argv ,
err_msg ) ;
2009-06-22 10:02:30 -04:00
exit ( 1 ) ;
}
break ;
2022-02-17 12:10:36 -05:00
case ' p ' :
free ( global . pidfile ) ;
if ( ( global . pidfile = strdup ( * argv ) ) = = NULL ) {
ha_alert ( " Cannot allocate memory for pidfile. \n " ) ;
exit ( EXIT_FAILURE ) ;
}
break ;
2011-09-10 13:20:23 -04:00
default : usage ( progname ) ;
2006-06-25 20:48:02 -04:00
}
}
}
else
2011-09-10 13:20:23 -04:00
usage ( progname ) ;
2006-06-25 20:48:02 -04:00
argv + + ; argc - - ;
}
2022-02-17 12:10:36 -05:00
free ( err_msg ) ;
}
MINOR: management: add some basic keyword dump infrastructure
It's difficult from outside haproxy to detect the supported keywords
and syntax. Interestingly, many of our modern keywords are enumerated
since they're registered from constructors, so it's not very hard to
enumerate most of them.
This patch creates some basic infrastructure to support dumping existing
keywords from different classes on stdout. The format will differ depending
on the classes, but the idea is that the output could easily be passed to
a script that generates some simple syntax highlighting rules, completion
rules for editors, syntax checkers or config parsers.
The principle chosen here is that if "-dK" is passed on the command-line,
at the end of the parsing the registered keywords will be dumped for the
requested classes passed after "-dK". Special name "help" will show known
classes, while "all" will execute all of them. The reason for doing that
after the end of the config processor is that it will also enumerate
internally-generated keywords, Lua or even those loaded from external
code (e.g. if an add-on is loaded using LD_PRELOAD). A typical way to
call this with a valid config would be:
./haproxy -dKall -q -c -f /path/to/config
If there's no config available, feeding /dev/null will also do the job,
though it will not be able to detect dynamically created keywords, of
course.
This patch also updates the management doc.
For now nothing but the help is listed, various subsystems will follow
in subsequent patches.
2022-03-08 10:01:40 -05:00
/* call the various keyword dump functions based on the comma-delimited list of
* classes in kwd_dump .
*/
static void dump_registered_keywords ( void )
{
char * end ;
int all __maybe_unused = 0 ;
for ( ; kwd_dump & & * kwd_dump ; kwd_dump = end ) {
end = strchr ( kwd_dump , ' , ' ) ;
if ( end )
* ( end + + ) = 0 ;
if ( strcmp ( kwd_dump , " help " ) = = 0 ) {
printf ( " # List of supported keyword classes: \n " ) ;
printf ( " all: list all keywords \n " ) ;
2022-03-29 09:36:56 -04:00
printf ( " acl: ACL keywords \n " ) ;
2022-03-29 09:02:44 -04:00
printf ( " cfg: configuration keywords \n " ) ;
2022-03-29 09:25:30 -04:00
printf ( " cli: CLI keywords \n " ) ;
2022-03-29 10:59:49 -04:00
printf ( " cnv: sample converter keywords \n " ) ;
2022-03-29 09:03:09 -04:00
printf ( " flt: filter names \n " ) ;
2022-03-29 10:51:29 -04:00
printf ( " smp: sample fetch functions \n " ) ;
2022-03-29 09:10:44 -04:00
printf ( " svc: service names \n " ) ;
MINOR: management: add some basic keyword dump infrastructure
It's difficult from outside haproxy to detect the supported keywords
and syntax. Interestingly, many of our modern keywords are enumerated
since they're registered from constructors, so it's not very hard to
enumerate most of them.
This patch creates some basic infrastructure to support dumping existing
keywords from different classes on stdout. The format will differ depending
on the classes, but the idea is that the output could easily be passed to
a script that generates some simple syntax highlighting rules, completion
rules for editors, syntax checkers or config parsers.
The principle chosen here is that if "-dK" is passed on the command-line,
at the end of the parsing the registered keywords will be dumped for the
requested classes passed after "-dK". Special name "help" will show known
classes, while "all" will execute all of them. The reason for doing that
after the end of the config processor is that it will also enumerate
internally-generated keywords, Lua or even those loaded from external
code (e.g. if an add-on is loaded using LD_PRELOAD). A typical way to
call this with a valid config would be:
./haproxy -dKall -q -c -f /path/to/config
If there's no config available, feeding /dev/null will also do the job,
though it will not be able to detect dynamically created keywords, of
course.
This patch also updates the management doc.
For now nothing but the help is listed, various subsystems will follow
in subsequent patches.
2022-03-08 10:01:40 -05:00
continue ;
}
else if ( strcmp ( kwd_dump , " all " ) = = 0 ) {
all = 1 ;
}
2022-03-29 09:02:44 -04:00
2022-03-29 09:36:56 -04:00
if ( all | | strcmp ( kwd_dump , " acl " ) = = 0 ) {
printf ( " # List of registered ACL keywords: \n " ) ;
acl_dump_kwd ( ) ;
}
2022-03-29 09:02:44 -04:00
if ( all | | strcmp ( kwd_dump , " cfg " ) = = 0 ) {
printf ( " # List of registered configuration keywords: \n " ) ;
cfg_dump_registered_keywords ( ) ;
}
2022-03-29 09:03:09 -04:00
2022-03-29 09:25:30 -04:00
if ( all | | strcmp ( kwd_dump , " cli " ) = = 0 ) {
printf ( " # List of registered CLI keywords: \n " ) ;
cli_list_keywords ( ) ;
}
2022-03-29 10:59:49 -04:00
if ( all | | strcmp ( kwd_dump , " cnv " ) = = 0 ) {
printf ( " # List of registered sample converter functions: \n " ) ;
smp_dump_conv_kw ( ) ;
}
2022-03-29 09:03:09 -04:00
if ( all | | strcmp ( kwd_dump , " flt " ) = = 0 ) {
printf ( " # List of registered filter names: \n " ) ;
flt_dump_kws ( NULL ) ;
}
2022-03-29 09:10:44 -04:00
2022-03-29 10:51:29 -04:00
if ( all | | strcmp ( kwd_dump , " smp " ) = = 0 ) {
printf ( " # List of registered sample fetch functions: \n " ) ;
smp_dump_fetch_kw ( ) ;
}
2022-03-29 09:10:44 -04:00
if ( all | | strcmp ( kwd_dump , " svc " ) = = 0 ) {
printf ( " # List of registered service names: \n " ) ;
list_services ( NULL ) ;
}
MINOR: management: add some basic keyword dump infrastructure
It's difficult from outside haproxy to detect the supported keywords
and syntax. Interestingly, many of our modern keywords are enumerated
since they're registered from constructors, so it's not very hard to
enumerate most of them.
This patch creates some basic infrastructure to support dumping existing
keywords from different classes on stdout. The format will differ depending
on the classes, but the idea is that the output could easily be passed to
a script that generates some simple syntax highlighting rules, completion
rules for editors, syntax checkers or config parsers.
The principle chosen here is that if "-dK" is passed on the command-line,
at the end of the parsing the registered keywords will be dumped for the
requested classes passed after "-dK". Special name "help" will show known
classes, while "all" will execute all of them. The reason for doing that
after the end of the config processor is that it will also enumerate
internally-generated keywords, Lua or even those loaded from external
code (e.g. if an add-on is loaded using LD_PRELOAD). A typical way to
call this with a valid config would be:
./haproxy -dKall -q -c -f /path/to/config
If there's no config available, feeding /dev/null will also do the job,
though it will not be able to detect dynamically created keywords, of
course.
This patch also updates the management doc.
For now nothing but the help is listed, various subsystems will follow
in subsequent patches.
2022-03-08 10:01:40 -05:00
}
}
2022-11-14 10:18:46 -05:00
/* Generate a random cluster-secret in case the setting is not provided in the
* configuration . This allows to use features which rely on it albeit with some
* limitations .
*/
static void generate_random_cluster_secret ( )
{
/* used as a default random cluster-secret if none defined. */
2023-09-07 12:43:52 -04:00
uint64_t rand ;
2022-11-14 10:18:46 -05:00
/* The caller must not overwrite an already defined secret. */
2023-09-07 12:43:52 -04:00
BUG_ON ( cluster_secret_isset ) ;
2022-11-14 10:18:46 -05:00
2023-09-07 12:43:52 -04:00
rand = ha_random64 ( ) ;
2022-11-14 10:18:46 -05:00
memcpy ( global . cluster_secret , & rand , sizeof ( rand ) ) ;
2023-09-07 12:43:52 -04:00
rand = ha_random64 ( ) ;
memcpy ( global . cluster_secret + sizeof ( rand ) , & rand , sizeof ( rand ) ) ;
cluster_secret_isset = 1 ;
2022-11-14 10:18:46 -05:00
}
2022-02-17 12:10:36 -05:00
/*
* This function initializes all the necessary variables . It only returns
* if everything is OK . If something fails , it exits .
*/
static void init ( int argc , char * * argv )
{
char * progname = global . log_tag . area ;
int err_code = 0 ;
struct proxy * px ;
struct post_check_fct * pcf ;
2022-04-21 12:02:53 -04:00
struct pre_check_fct * prcf ;
MEDIUM: startup: make read_cfg() return immediately on ENOMEM
This commit prepares read_cfg() to call load_cfg_in_mem() helper in order to
load configuration files in memory. Before, read_cfg() calls the parser for all
files from cfg_cfgfiles list and cumulates parser's errors and memprintf's
errors in for_each loop. memprintf's errors did not stop this loop and were
accounted just after.
Now, as we plan to load configuration files in memory, we stop the loop, if
memprintf() fails, and we show appropraite error message with ha_alert. Then
process terminates. So not all cumulated syntax-related errors will be shown
before exit in this case and we has to stop, because we run out of memory.
If we can't open the current file or we fail to allocate a memory to store
some configuration line, the previous behaviour is kept, process emits
appropriate alert message and exits.
If parser returns some syntax-related error on the current file, the previous
behaviour is kept as well. We cumulate such errors for all parsed files and we
check them just after the loop. All syntax-related errors for all files is
shown then as before in ha_alert messages line by line during the startup.
Then process will exit with 1.
As now cfg_cfgfiles list contains many pointers to some memory areas with
configuration files content and this content could be big, it's better to
free the list explicitly, when parsing was finished. So, let's change
read_cfg() to return some integer value to its caller init(), and let's perform
the free routine at a caller level, as cfg_cfgfiles list was initialized and
initially filled at this level.
2024-08-05 04:03:52 -04:00
struct cfgfile * cfg , * cfg_tmp ;
2024-08-08 12:54:28 -04:00
int ret , ideal_maxconn ;
2023-11-23 05:32:24 -05:00
const char * cc , * cflags , * opts ;
2022-02-17 12:10:36 -05:00
2022-12-02 11:17:43 -05:00
# ifdef USE_OPENSSL
# ifdef USE_OPENSSL_WOLFSSL
wolfSSL_Init ( ) ;
wolfSSL_Debugging_ON ( ) ;
# endif
2023-07-06 18:41:46 -04:00
2024-07-30 09:51:59 -04:00
# ifdef OPENSSL_IS_AWSLC
2023-07-06 18:41:46 -04:00
const char * version_str = OpenSSL_version ( OPENSSL_VERSION ) ;
if ( strncmp ( version_str , " AWS-LC " , 6 ) ! = 0 ) {
ha_alert ( " HAPRoxy built with AWS-LC but running with %s. \n " , version_str ) ;
exit ( 1 ) ;
}
# endif
2022-12-02 11:17:43 -05:00
# if (HA_OPENSSL_VERSION_NUMBER < 0x1010000fL)
2022-12-02 11:06:59 -05:00
/* Initialize the error strings of OpenSSL
* It only needs to be done explicitly with older versions of the SSL
* library . On newer versions , errors strings are loaded during start
* up . */
SSL_load_error_strings ( ) ;
2022-12-02 11:17:43 -05:00
# endif
2022-12-02 11:06:59 -05:00
# endif
2022-09-26 06:54:39 -04:00
startup_logs_init ( ) ;
2022-02-17 12:10:36 -05:00
if ( init_acl ( ) ! = 0 )
exit ( 1 ) ;
/* Initialise lua. */
hlua_init ( ) ;
2006-06-25 20:48:02 -04:00
2017-10-24 07:53:54 -04:00
global . mode | = ( arg_mode & ( MODE_DAEMON | MODE_MWORKER | MODE_FOREGROUND | MODE_VERBOSE
2021-03-29 04:29:07 -04:00
| MODE_QUIET | MODE_CHECK | MODE_DEBUG | MODE_ZERO_WARNING
2022-09-29 04:34:04 -04:00
| MODE_DIAG | MODE_CHECK_CONDITION | MODE_DUMP_LIBS | MODE_DUMP_KWD
| MODE_DUMP_CFG | MODE_DUMP_NB_L ) ) ;
2006-06-25 20:48:02 -04:00
2018-11-21 09:48:31 -05:00
if ( getenv ( " HAPROXY_MWORKER_WAIT_ONLY " ) ) {
2017-06-01 11:38:52 -04:00
unsetenv ( " HAPROXY_MWORKER_WAIT_ONLY " ) ;
2018-11-21 09:48:31 -05:00
global . mode | = MODE_MWORKER_WAIT ;
global . mode & = ~ MODE_MWORKER ;
2017-06-01 11:38:52 -04:00
}
2024-06-26 12:39:45 -04:00
/* Do check_condition, if we started with -cc, and exit. */
if ( global . mode & MODE_CHECK_CONDITION )
do_check_condition ( progname ) ;
2021-06-05 18:50:22 -04:00
2024-06-26 12:29:47 -04:00
/* set the atexit functions when not doing configuration check */
if ( ! ( global . mode & MODE_CHECK ) & & ( getenv ( " HAPROXY_MWORKER_REEXEC " ) ! = NULL ) ) {
if ( global . mode & MODE_MWORKER ) {
atexit_flag = 1 ;
atexit ( reexec_on_failure ) ;
} else if ( global . mode & MODE_MWORKER_WAIT ) {
atexit_flag = 1 ;
atexit ( exit_on_waitmode_failure ) ;
}
}
if ( change_dir & & chdir ( change_dir ) < 0 ) {
ha_alert ( " Could not change to directory %s : %s \n " , change_dir , strerror ( errno ) ) ;
exit ( 1 ) ;
}
usermsgs_clr ( " config " ) ;
2018-11-21 09:48:31 -05:00
/* in wait mode, we don't try to read the configuration files */
MEDIUM: startup: make read_cfg() return immediately on ENOMEM
This commit prepares read_cfg() to call load_cfg_in_mem() helper in order to
load configuration files in memory. Before, read_cfg() calls the parser for all
files from cfg_cfgfiles list and cumulates parser's errors and memprintf's
errors in for_each loop. memprintf's errors did not stop this loop and were
accounted just after.
Now, as we plan to load configuration files in memory, we stop the loop, if
memprintf() fails, and we show appropraite error message with ha_alert. Then
process terminates. So not all cumulated syntax-related errors will be shown
before exit in this case and we has to stop, because we run out of memory.
If we can't open the current file or we fail to allocate a memory to store
some configuration line, the previous behaviour is kept, process emits
appropriate alert message and exits.
If parser returns some syntax-related error on the current file, the previous
behaviour is kept as well. We cumulate such errors for all parsed files and we
check them just after the loop. All syntax-related errors for all files is
shown then as before in ha_alert messages line by line during the startup.
Then process will exit with 1.
As now cfg_cfgfiles list contains many pointers to some memory areas with
configuration files content and this content could be big, it's better to
free the list explicitly, when parsing was finished. So, let's change
read_cfg() to return some integer value to its caller init(), and let's perform
the free routine at a caller level, as cfg_cfgfiles list was initialized and
initially filled at this level.
2024-08-05 04:03:52 -04:00
if ( ! ( global . mode & MODE_MWORKER_WAIT ) ) {
ret = read_cfg ( progname ) ;
/* free memory to store config file content */
list_for_each_entry_safe ( cfg , cfg_tmp , & cfg_cfgfiles , list )
ha_free ( & cfg - > content ) ;
if ( ret < 0 )
exit ( 1 ) ;
}
2019-05-20 05:15:37 -04:00
2018-10-26 08:47:30 -04:00
if ( global . mode & MODE_MWORKER ) {
2018-11-19 12:46:18 -05:00
struct mworker_proc * tmproc ;
2019-04-12 10:15:00 -04:00
setenv ( " HAPROXY_MWORKER " , " 1 " , 1 ) ;
2018-11-19 12:46:18 -05:00
if ( getenv ( " HAPROXY_MWORKER_REEXEC " ) = = NULL ) {
2022-01-28 15:11:41 -05:00
tmproc = mworker_proc_new ( ) ;
2018-11-19 12:46:18 -05:00
if ( ! tmproc ) {
ha_alert ( " Cannot allocate process structures. \n " ) ;
exit ( EXIT_FAILURE ) ;
}
2019-04-12 10:09:23 -04:00
tmproc - > options | = PROC_O_TYPE_MASTER ; /* master */
2018-11-19 12:46:18 -05:00
tmproc - > pid = pid ;
2023-02-17 10:23:52 -05:00
tmproc - > timestamp = start_date . tv_sec ;
2018-11-19 12:46:18 -05:00
proc_self = tmproc ;
2021-04-21 01:32:39 -04:00
LIST_APPEND ( & proc_list , & tmproc - > list ) ;
2018-11-19 12:46:18 -05:00
}
2018-10-26 08:47:30 -04:00
2022-01-28 15:11:41 -05:00
tmproc = mworker_proc_new ( ) ;
2021-06-15 02:02:06 -04:00
if ( ! tmproc ) {
ha_alert ( " Cannot allocate process structures. \n " ) ;
exit ( EXIT_FAILURE ) ;
}
tmproc - > options | = PROC_O_TYPE_WORKER ; /* worker */
2018-10-26 08:47:30 -04:00
2021-06-15 02:02:06 -04:00
if ( mworker_cli_sockpair_new ( tmproc , 0 ) < 0 ) {
exit ( EXIT_FAILURE ) ;
2018-10-26 08:47:30 -04:00
}
2021-06-15 02:02:06 -04:00
LIST_APPEND ( & proc_list , & tmproc - > list ) ;
2018-11-21 09:48:31 -05:00
}
BUG/MEDIUM: master: force the thread count earlier
Christopher bisected that recent commit d0b73bca71 ("MEDIUM: listener:
switch bind_thread from global to group-local") broke the master socket
in that only the first out of the Nth initial connections would work,
where N is the number of threads, after which they all work.
The cause is that the master socket was bound to multiple threads,
despite global.nbthread being 1 there, so the incoming connection load
balancing would try to send incoming connections to non-existing threads,
however the bind_thread mask would nonetheless include multiple threads.
What happened is that in 1.9 we forced "nbthread" to 1 in the master's poll
loop with commit b3f2be338b ("MEDIUM: mworker: use the haproxy poll loop").
In 2.0, nbthread detection was enabled by default in commit 149ab779cc
("MAJOR: threads: enable one thread per CPU by default"). From this point
on, the operation above is unsafe because everything during startup is
performed with nbthread corresponding to the default value, then it
changes to one when starting the polling loop. But by then we weren't
using the wait mode except for reload errors, so even if it would have
happened nobody would have noticed.
In 2.5 with commit fab0fdce9 ("MEDIUM: mworker: reexec in waitpid mode
after successful loading") we started to rexecute all the time, not just
for errors, so as to release precious resources and to possibly spot bugs
that were rarely exposed in this mode. By then the incoming connection LB
was enforcing all_threads_mask on the listener's thread mask so that the
incorrect value was being corrected while using it.
Finally in 2.7 commit d0b73bca71 ("MEDIUM: listener: switch bind_thread
from global to group-local") replaces the all_threads_mask there with
the listener's bind_thread, but that one was never adjusted by the
starting master, whose thread group was filled to N threads by the
automatic detection during early setup.
The best approach here is to set nbthread to 1 very early in init()
when we're in the master in wait mode, so that we don't try to guess
the best value and don't end up with incorrect bindings anymore. This
patch does this and also sets nbtgroups to 1 in preparation for a
possible future where this will also be automatically calculated.
There is no need to backport this patch since no other versions were
affected, but if it were to be discovered that the incorrect bind mask
on some of the master's FDs could be responsible for any trouble in
older versions, then the backport should be safe (provided that
nbtgroups is dropped of course).
2022-07-22 11:35:49 -04:00
if ( global . mode & MODE_MWORKER_WAIT ) {
/* in exec mode, there's always exactly one thread. Failure to
* set these ones now will result in nbthread being detected
* automatically .
*/
global . nbtgroups = 1 ;
global . nbthread = 1 ;
}
2024-06-26 10:21:50 -04:00
if ( global . mode & ( MODE_MWORKER | MODE_MWORKER_WAIT ) )
mworker_create_master_cli ( ) ;
2018-10-26 08:47:30 -04:00
2021-03-16 10:11:17 -04:00
if ( ! LIST_ISEMPTY ( & mworker_cli_conf ) & & ! ( arg_mode & MODE_MWORKER ) ) {
2023-11-09 09:02:13 -05:00
ha_alert ( " a master CLI socket was defined, but master-worker mode (-W) is not enabled. \n " ) ;
exit ( EXIT_FAILURE ) ;
2021-03-16 10:11:17 -04:00
}
2021-10-13 03:50:53 -04:00
/* destroy unreferenced defaults proxies */
proxy_destroy_all_unref_defaults ( ) ;
2022-04-21 12:02:53 -04:00
list_for_each_entry ( prcf , & pre_check_list , list )
err_code | = prcf - > fct ( ) ;
2021-02-12 08:08:31 -05:00
2022-05-04 08:29:46 -04:00
if ( err_code & ( ERR_ABORT | ERR_FATAL ) ) {
ha_alert ( " Fatal errors found in configuration. \n " ) ;
exit ( 1 ) ;
}
2023-05-17 03:02:21 -04:00
/* update the ready date that will be used to count the startup time
* during config checks ( e . g . to schedule certain tasks if needed )
*/
clock_update_date ( 0 , 1 ) ;
2023-05-16 13:19:36 -04:00
clock_adjust_now_offset ( ) ;
2023-05-17 03:02:21 -04:00
ready_date = date ;
2023-05-16 13:19:36 -04:00
2022-12-08 02:13:20 -05:00
/* Note: global.nbthread will be initialized as part of this call */
2009-07-23 07:36:36 -04:00
err_code | = check_config_validity ( ) ;
2023-05-17 03:02:21 -04:00
/* update the ready date to also account for the check time */
clock_update_date ( 0 , 1 ) ;
2023-05-16 13:19:36 -04:00
clock_adjust_now_offset ( ) ;
2023-05-17 03:02:21 -04:00
ready_date = date ;
2019-08-12 03:51:07 -04:00
for ( px = proxies_list ; px ; px = px - > next ) {
struct server * srv ;
struct post_proxy_check_fct * ppcf ;
struct post_server_check_fct * pscf ;
2021-10-06 08:24:19 -04:00
if ( px - > flags & ( PR_FL_DISABLED | PR_FL_STOPPED ) )
2020-11-02 10:20:13 -05:00
continue ;
2019-08-12 03:51:07 -04:00
list_for_each_entry ( pscf , & post_server_check_list , list ) {
for ( srv = px - > srv ; srv ; srv = srv - > next )
err_code | = pscf - > fct ( srv ) ;
}
list_for_each_entry ( ppcf , & post_proxy_check_list , list )
err_code | = ppcf - > fct ( px ) ;
2024-03-09 16:18:51 -05:00
px - > flags | = PR_FL_CHECKED ;
2019-08-12 03:51:07 -04:00
}
2009-07-23 07:36:36 -04:00
if ( err_code & ( ERR_ABORT | ERR_FATAL ) ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " Fatal errors found in configuration. \n " ) ;
2009-06-22 09:48:36 -04:00
exit ( 1 ) ;
}
2006-06-25 20:48:02 -04:00
2020-02-27 10:45:50 -05:00
err_code | = pattern_finalize_config ( ) ;
if ( err_code & ( ERR_ABORT | ERR_FATAL ) ) {
ha_alert ( " Failed to finalize pattern config. \n " ) ;
exit ( 1 ) ;
}
2019-04-11 08:47:08 -04:00
2021-07-17 06:31:08 -04:00
if ( global . rlimit_memmax_all )
global . rlimit_memmax = global . rlimit_memmax_all ;
2019-05-22 13:24:06 -04:00
# ifdef USE_NS
2014-11-17 09:11:45 -05:00
err_code | = netns_init ( ) ;
if ( err_code & ( ERR_ABORT | ERR_FATAL ) ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " Failed to initialize namespace support. \n " ) ;
2014-11-17 09:11:45 -05:00
exit ( 1 ) ;
}
# endif
2023-09-04 10:53:20 -04:00
thread_detect_binding_discrepancies ( ) ;
2023-09-04 11:36:20 -04:00
thread_detect_more_than_cpus ( ) ;
2023-09-04 10:53:20 -04:00
2016-11-02 10:33:15 -04:00
/* Apply server states */
apply_server_state ( ) ;
2024-04-24 05:09:06 -04:00
/* Preload internal counters. */
apply_stats_file ( ) ;
2017-11-24 10:54:05 -05:00
for ( px = proxies_list ; px ; px = px - > next )
2016-11-02 10:33:15 -04:00
srv_compute_all_admin_states ( px ) ;
2016-11-02 10:34:05 -04:00
/* Apply servers' configured address */
err_code | = srv_init_addr ( ) ;
if ( err_code & ( ERR_ABORT | ERR_FATAL ) ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " Failed to initialize server(s) addr. \n " ) ;
2016-11-02 10:34:05 -04:00
exit ( 1 ) ;
}
2020-04-15 10:42:39 -04:00
if ( warned & WARN_ANY & & global . mode & MODE_ZERO_WARNING ) {
ha_alert ( " Some warnings were found and 'zero-warning' is set. Aborting. \n " ) ;
exit ( 1 ) ;
}
2021-12-28 09:43:11 -05:00
# if defined(HA_HAVE_DUMP_LIBS)
if ( global . mode & MODE_DUMP_LIBS ) {
qfprintf ( stdout , " List of loaded object files: \n " ) ;
chunk_reset ( & trash ) ;
2023-03-22 06:37:54 -04:00
if ( dump_libs ( & trash , ( ( arg_mode & ( MODE_QUIET | MODE_VERBOSE ) ) = = MODE_VERBOSE ) ) )
2021-12-28 09:43:11 -05:00
printf ( " %s " , trash . area ) ;
}
# endif
MINOR: management: add some basic keyword dump infrastructure
It's difficult from outside haproxy to detect the supported keywords
and syntax. Interestingly, many of our modern keywords are enumerated
since they're registered from constructors, so it's not very hard to
enumerate most of them.
This patch creates some basic infrastructure to support dumping existing
keywords from different classes on stdout. The format will differ depending
on the classes, but the idea is that the output could easily be passed to
a script that generates some simple syntax highlighting rules, completion
rules for editors, syntax checkers or config parsers.
The principle chosen here is that if "-dK" is passed on the command-line,
at the end of the parsing the registered keywords will be dumped for the
requested classes passed after "-dK". Special name "help" will show known
classes, while "all" will execute all of them. The reason for doing that
after the end of the config processor is that it will also enumerate
internally-generated keywords, Lua or even those loaded from external
code (e.g. if an add-on is loaded using LD_PRELOAD). A typical way to
call this with a valid config would be:
./haproxy -dKall -q -c -f /path/to/config
If there's no config available, feeding /dev/null will also do the job,
though it will not be able to detect dynamically created keywords, of
course.
This patch also updates the management doc.
For now nothing but the help is listed, various subsystems will follow
in subsequent patches.
2022-03-08 10:01:40 -05:00
if ( global . mode & MODE_DUMP_KWD )
dump_registered_keywords ( ) ;
2024-02-03 06:05:08 -05:00
if ( global . mode & MODE_DIAG ) {
cfg_run_diagnostics ( ) ;
}
2006-06-25 20:48:02 -04:00
if ( global . mode & MODE_CHECK ) {
2012-02-02 11:48:18 -05:00
struct peers * pr ;
struct proxy * px ;
2020-04-15 10:06:11 -04:00
if ( warned & WARN_ANY )
qfprintf ( stdout , " Warnings were found. \n " ) ;
2017-07-13 03:07:09 -04:00
for ( pr = cfg_peers ; pr ; pr = pr - > next )
2012-02-02 11:48:18 -05:00
if ( pr - > peers_fe )
break ;
2017-11-24 10:54:05 -05:00
for ( px = proxies_list ; px ; px = px - > next )
2021-10-06 08:24:19 -04:00
if ( ! ( px - > flags & ( PR_FL_DISABLED | PR_FL_STOPPED ) ) & & px - > li_all )
2012-02-02 11:48:18 -05:00
break ;
2021-08-13 03:32:50 -04:00
if ( ! px ) {
/* We may only have log-forward section */
for ( px = cfg_log_forward ; px ; px = px - > next )
2021-10-06 08:24:19 -04:00
if ( ! ( px - > flags & ( PR_FL_DISABLED | PR_FL_STOPPED ) ) & & px - > li_all )
2021-08-13 03:32:50 -04:00
break ;
}
2012-02-02 11:48:18 -05:00
if ( pr | | px ) {
/* At least one peer or one listener has been found */
2023-11-09 08:48:50 -05:00
if ( global . mode & MODE_VERBOSE )
qfprintf ( stdout , " Configuration file is valid \n " ) ;
MINOR: haproxy: Make use of deinit_and_exit() for clean exits
Particularly cleanly deinit() after a configuration check to clean up the
output of valgrind which reports "possible losses" without a deinit() and
does not with a deinit(), converting actual losses into proper hard losses
which makes the whole stuff easier to analyze.
As an example, given an example configuration of the following:
frontend foo
bind *:8080
mode http
Running `haproxy -c -f cfg` within valgrind will report 4 possible losses:
$ valgrind --leak-check=full ./haproxy -c -f ./example.cfg
==21219== Memcheck, a memory error detector
==21219== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==21219== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==21219== Command: ./haproxy -c -f ./example.cfg
==21219==
[WARNING] 165/001100 (21219) : config : missing timeouts for frontend 'foo'.
| While not properly invalid, you will certainly encounter various problems
| with such a configuration. To fix this, please ensure that all following
| timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Warnings were found.
Configuration file is valid
==21219==
==21219== HEAP SUMMARY:
==21219== in use at exit: 1,436,631 bytes in 130 blocks
==21219== total heap usage: 153 allocs, 23 frees, 1,447,758 bytes allocated
==21219==
==21219== 7 bytes in 1 blocks are possibly lost in loss record 5 of 54
==21219== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21219== by 0x5726489: strdup (strdup.c:42)
==21219== by 0x468FD9: bind_conf_alloc (listener.h:158)
==21219== by 0x468FD9: cfg_parse_listen (cfgparse-listen.c:557)
==21219== by 0x459DF3: readcfgfile (cfgparse.c:2167)
==21219== by 0x5056CD: init (haproxy.c:2021)
==21219== by 0x418232: main (haproxy.c:3121)
==21219==
==21219== 14 bytes in 1 blocks are possibly lost in loss record 9 of 54
==21219== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21219== by 0x5726489: strdup (strdup.c:42)
==21219== by 0x468F9B: bind_conf_alloc (listener.h:154)
==21219== by 0x468F9B: cfg_parse_listen (cfgparse-listen.c:557)
==21219== by 0x459DF3: readcfgfile (cfgparse.c:2167)
==21219== by 0x5056CD: init (haproxy.c:2021)
==21219== by 0x418232: main (haproxy.c:3121)
==21219==
==21219== 128 bytes in 1 blocks are possibly lost in loss record 35 of 54
==21219== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21219== by 0x468F90: bind_conf_alloc (listener.h:152)
==21219== by 0x468F90: cfg_parse_listen (cfgparse-listen.c:557)
==21219== by 0x459DF3: readcfgfile (cfgparse.c:2167)
==21219== by 0x5056CD: init (haproxy.c:2021)
==21219== by 0x418232: main (haproxy.c:3121)
==21219==
==21219== 608 bytes in 1 blocks are possibly lost in loss record 46 of 54
==21219== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21219== by 0x4B953A: create_listeners (listener.c:576)
==21219== by 0x4578F6: str2listener (cfgparse.c:192)
==21219== by 0x469039: cfg_parse_listen (cfgparse-listen.c:568)
==21219== by 0x459DF3: readcfgfile (cfgparse.c:2167)
==21219== by 0x5056CD: init (haproxy.c:2021)
==21219== by 0x418232: main (haproxy.c:3121)
==21219==
==21219== LEAK SUMMARY:
==21219== definitely lost: 0 bytes in 0 blocks
==21219== indirectly lost: 0 bytes in 0 blocks
==21219== possibly lost: 757 bytes in 4 blocks
==21219== still reachable: 1,435,874 bytes in 126 blocks
==21219== suppressed: 0 bytes in 0 blocks
==21219== Reachable blocks (those to which a pointer was found) are not shown.
==21219== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==21219==
==21219== For counts of detected and suppressed errors, rerun with: -v
==21219== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
Re-running the same command with the patch applied will not report any
losses any more:
$ valgrind --leak-check=full ./haproxy -c -f ./example.cfg
==22124== Memcheck, a memory error detector
==22124== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==22124== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==22124== Command: ./haproxy -c -f ./example.cfg
==22124==
[WARNING] 165/001503 (22124) : config : missing timeouts for frontend 'foo'.
| While not properly invalid, you will certainly encounter various problems
| with such a configuration. To fix this, please ensure that all following
| timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Warnings were found.
Configuration file is valid
==22124==
==22124== HEAP SUMMARY:
==22124== in use at exit: 313,864 bytes in 82 blocks
==22124== total heap usage: 153 allocs, 71 frees, 1,447,758 bytes allocated
==22124==
==22124== LEAK SUMMARY:
==22124== definitely lost: 0 bytes in 0 blocks
==22124== indirectly lost: 0 bytes in 0 blocks
==22124== possibly lost: 0 bytes in 0 blocks
==22124== still reachable: 313,864 bytes in 82 blocks
==22124== suppressed: 0 bytes in 0 blocks
==22124== Reachable blocks (those to which a pointer was found) are not shown.
==22124== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==22124==
==22124== For counts of detected and suppressed errors, rerun with: -v
==22124== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
It might be worth investigating what exactly HAProxy does to lose pointers
to the start of those 4 memory areas and then to be able to still free them
during deinit(). If HAProxy is able to free them, they ideally should be
"still reachable" and not "possibly lost".
2020-06-13 18:37:42 -04:00
deinit_and_exit ( 0 ) ;
2012-02-02 11:48:18 -05:00
}
qfprintf ( stdout , " Configuration file has no error but will not start (no listener) => exit(2). \n " ) ;
exit ( 2 ) ;
2006-06-25 20:48:02 -04:00
}
2022-09-14 11:51:55 -04:00
if ( global . mode & MODE_DUMP_CFG )
deinit_and_exit ( 0 ) ;
2022-05-16 10:24:31 -04:00
# ifdef USE_OPENSSL
2022-07-19 12:13:29 -04:00
2022-05-16 10:24:31 -04:00
/* Initialize SSL random generator. Must be called before chroot for
* access to / dev / urandom , and before ha_random_boot ( ) which may use
* RAND_bytes ( ) .
*/
if ( ! ssl_initialize_random ( ) ) {
ha_alert ( " OpenSSL random data generator initialization failed. \n " ) ;
exit ( EXIT_FAILURE ) ;
}
# endif
ha_random_boot ( argv ) ; // the argv pointer brings some kernel-fed entropy
2012-08-27 18:06:31 -04:00
/* now we know the buffer size, we can initialize the channels and buffers */
2012-10-12 17:49:43 -04:00
init_buffer ( ) ;
2009-09-23 17:37:52 -04:00
2016-12-21 13:57:00 -05:00
list_for_each_entry ( pcf , & post_check_list , list ) {
err_code | = pcf - > fct ( ) ;
if ( err_code & ( ERR_ABORT | ERR_FATAL ) )
exit ( 1 ) ;
}
2022-06-21 05:11:50 -04:00
/* set the default maxconn in the master, but let it be rewritable with -n */
if ( global . mode & MODE_MWORKER_WAIT )
2023-03-09 08:28:44 -05:00
global . maxconn = MASTER_MAXCONN ;
2022-06-21 05:11:50 -04:00
2006-06-25 20:48:02 -04:00
if ( cfg_maxconn > 0 )
global . maxconn = cfg_maxconn ;
2021-03-13 05:00:33 -05:00
if ( global . cli_fe )
global . maxsock + = global . cli_fe - > maxconn ;
2019-03-01 03:39:42 -05:00
if ( cfg_peers ) {
/* peers also need to bypass global maxconn */
struct peers * p = cfg_peers ;
for ( p = cfg_peers ; p ; p = p - > next )
if ( p - > peers_fe )
global . maxsock + = p - > peers_fe - > maxconn ;
}
2015-01-15 15:45:22 -05:00
/* Now we want to compute the maxconn and possibly maxsslconn values.
2019-03-01 09:43:14 -05:00
* It ' s a bit tricky . Maxconn defaults to the pre - computed value based
* on rlim_fd_cur and the number of FDs in use due to the configuration ,
* and maxsslconn defaults to DEFAULT_MAXSSLCONN . On top of that we can
* enforce a lower limit based on memmax .
2015-01-15 15:45:22 -05:00
*
* If memmax is set , then it depends on which values are set . If
* maxsslconn is set , we use memmax to determine how many cleartext
* connections may be added , and set maxconn to the sum of the two .
* If maxconn is set and not maxsslconn , maxsslconn is computed from
* the remaining amount of memory between memmax and the cleartext
* connections . If neither are set , then it is considered that all
* connections are SSL - capable , and maxconn is computed based on this ,
* then maxsslconn accordingly . We need to know if SSL is used on the
* frontends , backends , or both , because when it ' s used on both sides ,
* we need twice the value for maxsslconn , but we only count the
* handshake once since it is not performed on the two sides at the
* same time ( frontend - side is terminated before backend - side begins ) .
* The SSL stack is supposed to have filled ssl_session_cost and
2015-01-28 13:03:21 -05:00
* ssl_handshake_cost during its initialization . In any case , if
* SYSTEM_MAXCONN is set , we still enforce it as an upper limit for
* maxconn in order to protect the system .
2015-01-15 15:45:22 -05:00
*/
2019-03-01 09:43:14 -05:00
ideal_maxconn = compute_ideal_maxconn ( ) ;
2015-01-15 15:45:22 -05:00
if ( ! global . rlimit_memmax ) {
if ( global . maxconn = = 0 ) {
2019-03-01 09:43:14 -05:00
global . maxconn = ideal_maxconn ;
2015-01-15 15:45:22 -05:00
if ( global . mode & ( MODE_VERBOSE | MODE_DEBUG ) )
fprintf ( stderr , " Note: setting global.maxconn to %d. \n " , global . maxconn ) ;
}
}
# ifdef USE_OPENSSL
else if ( ! global . maxconn & & ! global . maxsslconn & &
( global . ssl_used_frontend | | global . ssl_used_backend ) ) {
/* memmax is set, compute everything automatically. Here we want
* to ensure that all SSL connections will be served . We take
* care of the number of sides where SSL is used , and consider
* the worst case : SSL used on both sides and doing a handshake
* simultaneously . Note that we can ' t have more than maxconn
* handshakes at a time by definition , so for the worst case of
* two SSL conns per connection , we count a single handshake .
*/
int sides = ! ! global . ssl_used_frontend + ! ! global . ssl_used_backend ;
int64_t mem = global . rlimit_memmax * 1048576ULL ;
MEDIUM: init: always try to push the FD limit when maxconn is set from -m
When a maximum memory setting is passed to haproxy and maxconn is not set
and ulimit-n is not set, it is expected that maxconn will be set to the
highest value permitted by this memory setting, possibly affecting the
FD limit.
When maxconn was changed to be deduced from the current process's FD limit,
the automatic setting above was partially lost because it now remains
limited to the current FD limit in addition to being limited to the
memory usage. For unprivileged processes it does not change anything,
but for privileged processes the difference is important. Indeed, the
previous behavior ensured that the new FD limit could be enforced on
the process as long as the user had the privilege to do so. Now this
does not happen anymore, and some people rely on this for automatic
sizing in VM environments.
This patch implements the ability to verify if the setting will be
enforceable on the process or not. First it computes maxconn based on
the memory limits alone, then checks if the process is willing to accept
them, otherwise tries again by respecting the process' hard limit.
Thanks to this we now have the best of the pre-2.0 behavior and the
current one, in that privileged users will be able to get as high a
maxconn as they need just based on the memory limit, while unprivileged
users will still get as high a setting as permitted by the intersection
of the memory limit and the process' FD limit.
Ideally, after some observation period, this patch along with the
previous one "MINOR: init: move the maxsock calculation code to
compute_ideal_maxsock()" should be backported to 2.1 and 2.0.
Thanks to Baptiste for raising the issue.
2020-03-10 12:54:54 -04:00
int retried = 0 ;
2015-01-15 15:45:22 -05:00
2022-05-24 01:43:57 -04:00
mem - = global . tune . sslcachesize * 200ULL ; // about 200 bytes per SSL cache entry
2015-01-15 15:45:22 -05:00
mem - = global . maxzlibmem ;
mem = mem * MEM_USABLE_RATIO ;
MEDIUM: init: always try to push the FD limit when maxconn is set from -m
When a maximum memory setting is passed to haproxy and maxconn is not set
and ulimit-n is not set, it is expected that maxconn will be set to the
highest value permitted by this memory setting, possibly affecting the
FD limit.
When maxconn was changed to be deduced from the current process's FD limit,
the automatic setting above was partially lost because it now remains
limited to the current FD limit in addition to being limited to the
memory usage. For unprivileged processes it does not change anything,
but for privileged processes the difference is important. Indeed, the
previous behavior ensured that the new FD limit could be enforced on
the process as long as the user had the privilege to do so. Now this
does not happen anymore, and some people rely on this for automatic
sizing in VM environments.
This patch implements the ability to verify if the setting will be
enforceable on the process or not. First it computes maxconn based on
the memory limits alone, then checks if the process is willing to accept
them, otherwise tries again by respecting the process' hard limit.
Thanks to this we now have the best of the pre-2.0 behavior and the
current one, in that privileged users will be able to get as high a
maxconn as they need just based on the memory limit, while unprivileged
users will still get as high a setting as permitted by the intersection
of the memory limit and the process' FD limit.
Ideally, after some observation period, this patch along with the
previous one "MINOR: init: move the maxsock calculation code to
compute_ideal_maxsock()" should be backported to 2.1 and 2.0.
Thanks to Baptiste for raising the issue.
2020-03-10 12:54:54 -04:00
/* Principle: we test once to set maxconn according to the free
* memory . If it results in values the system rejects , we try a
* second time by respecting rlim_fd_max . If it fails again , we
* go back to the initial value and will let the final code
* dealing with rlimit report the error . That ' s up to 3 attempts .
*/
do {
global . maxconn = mem /
( ( STREAM_MAX_COST + 2 * global . tune . bufsize ) + // stream + 2 buffers per stream
sides * global . ssl_session_max_cost + // SSL buffers, one per side
global . ssl_handshake_max_cost ) ; // 1 handshake per connection max
if ( retried = = 1 )
global . maxconn = MIN ( global . maxconn , ideal_maxconn ) ;
global . maxconn = round_2dig ( global . maxconn ) ;
2015-01-28 13:03:21 -05:00
# ifdef SYSTEM_MAXCONN
MEDIUM: init: always try to push the FD limit when maxconn is set from -m
When a maximum memory setting is passed to haproxy and maxconn is not set
and ulimit-n is not set, it is expected that maxconn will be set to the
highest value permitted by this memory setting, possibly affecting the
FD limit.
When maxconn was changed to be deduced from the current process's FD limit,
the automatic setting above was partially lost because it now remains
limited to the current FD limit in addition to being limited to the
memory usage. For unprivileged processes it does not change anything,
but for privileged processes the difference is important. Indeed, the
previous behavior ensured that the new FD limit could be enforced on
the process as long as the user had the privilege to do so. Now this
does not happen anymore, and some people rely on this for automatic
sizing in VM environments.
This patch implements the ability to verify if the setting will be
enforceable on the process or not. First it computes maxconn based on
the memory limits alone, then checks if the process is willing to accept
them, otherwise tries again by respecting the process' hard limit.
Thanks to this we now have the best of the pre-2.0 behavior and the
current one, in that privileged users will be able to get as high a
maxconn as they need just based on the memory limit, while unprivileged
users will still get as high a setting as permitted by the intersection
of the memory limit and the process' FD limit.
Ideally, after some observation period, this patch along with the
previous one "MINOR: init: move the maxsock calculation code to
compute_ideal_maxsock()" should be backported to 2.1 and 2.0.
Thanks to Baptiste for raising the issue.
2020-03-10 12:54:54 -04:00
if ( global . maxconn > SYSTEM_MAXCONN )
global . maxconn = SYSTEM_MAXCONN ;
2015-01-28 13:03:21 -05:00
# endif /* SYSTEM_MAXCONN */
MEDIUM: init: always try to push the FD limit when maxconn is set from -m
When a maximum memory setting is passed to haproxy and maxconn is not set
and ulimit-n is not set, it is expected that maxconn will be set to the
highest value permitted by this memory setting, possibly affecting the
FD limit.
When maxconn was changed to be deduced from the current process's FD limit,
the automatic setting above was partially lost because it now remains
limited to the current FD limit in addition to being limited to the
memory usage. For unprivileged processes it does not change anything,
but for privileged processes the difference is important. Indeed, the
previous behavior ensured that the new FD limit could be enforced on
the process as long as the user had the privilege to do so. Now this
does not happen anymore, and some people rely on this for automatic
sizing in VM environments.
This patch implements the ability to verify if the setting will be
enforceable on the process or not. First it computes maxconn based on
the memory limits alone, then checks if the process is willing to accept
them, otherwise tries again by respecting the process' hard limit.
Thanks to this we now have the best of the pre-2.0 behavior and the
current one, in that privileged users will be able to get as high a
maxconn as they need just based on the memory limit, while unprivileged
users will still get as high a setting as permitted by the intersection
of the memory limit and the process' FD limit.
Ideally, after some observation period, this patch along with the
previous one "MINOR: init: move the maxsock calculation code to
compute_ideal_maxsock()" should be backported to 2.1 and 2.0.
Thanks to Baptiste for raising the issue.
2020-03-10 12:54:54 -04:00
global . maxsslconn = sides * global . maxconn ;
if ( check_if_maxsock_permitted ( compute_ideal_maxsock ( global . maxconn ) ) )
break ;
} while ( retried + + < 2 ) ;
2015-01-15 15:45:22 -05:00
if ( global . mode & ( MODE_VERBOSE | MODE_DEBUG ) )
fprintf ( stderr , " Note: setting global.maxconn to %d and global.maxsslconn to %d. \n " ,
global . maxconn , global . maxsslconn ) ;
}
else if ( ! global . maxsslconn & &
( global . ssl_used_frontend | | global . ssl_used_backend ) ) {
/* memmax and maxconn are known, compute maxsslconn automatically.
* maxsslconn being forced , we don ' t know how many of it will be
* on each side if both sides are being used . The worst case is
* when all connections use only one SSL instance because
* handshakes may be on two sides at the same time .
*/
int sides = ! ! global . ssl_used_frontend + ! ! global . ssl_used_backend ;
int64_t mem = global . rlimit_memmax * 1048576ULL ;
int64_t sslmem ;
2022-05-26 02:55:05 -04:00
mem - = global . tune . sslcachesize * 200ULL ; // about 200 bytes per SSL cache entry
2015-01-15 15:45:22 -05:00
mem - = global . maxzlibmem ;
mem = mem * MEM_USABLE_RATIO ;
REORG/MAJOR: session: rename the "session" entity to "stream"
With HTTP/2, we'll have to support multiplexed streams. A stream is in
fact the largest part of what we currently call a session, it has buffers,
logs, etc.
In order to catch any error, this commit removes any reference to the
struct session and tries to rename most "session" occurrences in function
names to "stream" and "sess" to "strm" when that's related to a session.
The files stream.{c,h} were added and session.{c,h} removed.
The session will be reintroduced later and a few parts of the stream
will progressively be moved overthere. It will more or less contain
only what we need in an embryonic session.
Sample fetch functions and converters will have to change a bit so
that they'll use an L5 (session) instead of what's currently called
"L4" which is in fact L6 for now.
Once all changes are completed, we should see approximately this :
L7 - http_txn
L6 - stream
L5 - session
L4 - connection | applet
There will be at most one http_txn per stream, and a same session will
possibly be referenced by multiple streams. A connection will point to
a session and to a stream. The session will hold all the information
we need to keep even when we don't yet have a stream.
Some more cleanup is needed because some code was already far from
being clean. The server queue management still refers to sessions at
many places while comments talk about connections. This will have to
be cleaned up once we have a server-side connection pool manager.
Stream flags "SN_*" still need to be renamed, it doesn't seem like
any of them will need to move to the session.
2015-04-02 18:22:06 -04:00
sslmem = mem - global . maxconn * ( int64_t ) ( STREAM_MAX_COST + 2 * global . tune . bufsize ) ;
2015-01-15 15:45:22 -05:00
global . maxsslconn = sslmem / ( global . ssl_session_max_cost + global . ssl_handshake_max_cost ) ;
global . maxsslconn = round_2dig ( global . maxsslconn ) ;
if ( sslmem < = 0 | | global . maxsslconn < sides ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " Cannot compute the automatic maxsslconn because global.maxconn is already too "
" high for the global.memmax value (%d MB). The absolute maximum possible value "
" without SSL is %d, but %d was found and SSL is in use. \n " ,
global . rlimit_memmax ,
( int ) ( mem / ( STREAM_MAX_COST + 2 * global . tune . bufsize ) ) ,
global . maxconn ) ;
2015-01-15 15:45:22 -05:00
exit ( 1 ) ;
}
if ( global . maxsslconn > sides * global . maxconn )
global . maxsslconn = sides * global . maxconn ;
if ( global . mode & ( MODE_VERBOSE | MODE_DEBUG ) )
fprintf ( stderr , " Note: setting global.maxsslconn to %d \n " , global . maxsslconn ) ;
}
# endif
else if ( ! global . maxconn ) {
/* memmax and maxsslconn are known/unused, compute maxconn automatically */
int sides = ! ! global . ssl_used_frontend + ! ! global . ssl_used_backend ;
int64_t mem = global . rlimit_memmax * 1048576ULL ;
int64_t clearmem ;
MEDIUM: init: always try to push the FD limit when maxconn is set from -m
When a maximum memory setting is passed to haproxy and maxconn is not set
and ulimit-n is not set, it is expected that maxconn will be set to the
highest value permitted by this memory setting, possibly affecting the
FD limit.
When maxconn was changed to be deduced from the current process's FD limit,
the automatic setting above was partially lost because it now remains
limited to the current FD limit in addition to being limited to the
memory usage. For unprivileged processes it does not change anything,
but for privileged processes the difference is important. Indeed, the
previous behavior ensured that the new FD limit could be enforced on
the process as long as the user had the privilege to do so. Now this
does not happen anymore, and some people rely on this for automatic
sizing in VM environments.
This patch implements the ability to verify if the setting will be
enforceable on the process or not. First it computes maxconn based on
the memory limits alone, then checks if the process is willing to accept
them, otherwise tries again by respecting the process' hard limit.
Thanks to this we now have the best of the pre-2.0 behavior and the
current one, in that privileged users will be able to get as high a
maxconn as they need just based on the memory limit, while unprivileged
users will still get as high a setting as permitted by the intersection
of the memory limit and the process' FD limit.
Ideally, after some observation period, this patch along with the
previous one "MINOR: init: move the maxsock calculation code to
compute_ideal_maxsock()" should be backported to 2.1 and 2.0.
Thanks to Baptiste for raising the issue.
2020-03-10 12:54:54 -04:00
int retried = 0 ;
2015-01-15 15:45:22 -05:00
if ( global . ssl_used_frontend | | global . ssl_used_backend )
2022-05-26 02:55:05 -04:00
mem - = global . tune . sslcachesize * 200ULL ; // about 200 bytes per SSL cache entry
2015-01-15 15:45:22 -05:00
mem - = global . maxzlibmem ;
mem = mem * MEM_USABLE_RATIO ;
clearmem = mem ;
if ( sides )
clearmem - = ( global . ssl_session_max_cost + global . ssl_handshake_max_cost ) * ( int64_t ) global . maxsslconn ;
MEDIUM: init: always try to push the FD limit when maxconn is set from -m
When a maximum memory setting is passed to haproxy and maxconn is not set
and ulimit-n is not set, it is expected that maxconn will be set to the
highest value permitted by this memory setting, possibly affecting the
FD limit.
When maxconn was changed to be deduced from the current process's FD limit,
the automatic setting above was partially lost because it now remains
limited to the current FD limit in addition to being limited to the
memory usage. For unprivileged processes it does not change anything,
but for privileged processes the difference is important. Indeed, the
previous behavior ensured that the new FD limit could be enforced on
the process as long as the user had the privilege to do so. Now this
does not happen anymore, and some people rely on this for automatic
sizing in VM environments.
This patch implements the ability to verify if the setting will be
enforceable on the process or not. First it computes maxconn based on
the memory limits alone, then checks if the process is willing to accept
them, otherwise tries again by respecting the process' hard limit.
Thanks to this we now have the best of the pre-2.0 behavior and the
current one, in that privileged users will be able to get as high a
maxconn as they need just based on the memory limit, while unprivileged
users will still get as high a setting as permitted by the intersection
of the memory limit and the process' FD limit.
Ideally, after some observation period, this patch along with the
previous one "MINOR: init: move the maxsock calculation code to
compute_ideal_maxsock()" should be backported to 2.1 and 2.0.
Thanks to Baptiste for raising the issue.
2020-03-10 12:54:54 -04:00
/* Principle: we test once to set maxconn according to the free
* memory . If it results in values the system rejects , we try a
* second time by respecting rlim_fd_max . If it fails again , we
* go back to the initial value and will let the final code
* dealing with rlimit report the error . That ' s up to 3 attempts .
*/
do {
global . maxconn = clearmem / ( STREAM_MAX_COST + 2 * global . tune . bufsize ) ;
if ( retried = = 1 )
global . maxconn = MIN ( global . maxconn , ideal_maxconn ) ;
global . maxconn = round_2dig ( global . maxconn ) ;
2015-01-28 13:03:21 -05:00
# ifdef SYSTEM_MAXCONN
MEDIUM: init: always try to push the FD limit when maxconn is set from -m
When a maximum memory setting is passed to haproxy and maxconn is not set
and ulimit-n is not set, it is expected that maxconn will be set to the
highest value permitted by this memory setting, possibly affecting the
FD limit.
When maxconn was changed to be deduced from the current process's FD limit,
the automatic setting above was partially lost because it now remains
limited to the current FD limit in addition to being limited to the
memory usage. For unprivileged processes it does not change anything,
but for privileged processes the difference is important. Indeed, the
previous behavior ensured that the new FD limit could be enforced on
the process as long as the user had the privilege to do so. Now this
does not happen anymore, and some people rely on this for automatic
sizing in VM environments.
This patch implements the ability to verify if the setting will be
enforceable on the process or not. First it computes maxconn based on
the memory limits alone, then checks if the process is willing to accept
them, otherwise tries again by respecting the process' hard limit.
Thanks to this we now have the best of the pre-2.0 behavior and the
current one, in that privileged users will be able to get as high a
maxconn as they need just based on the memory limit, while unprivileged
users will still get as high a setting as permitted by the intersection
of the memory limit and the process' FD limit.
Ideally, after some observation period, this patch along with the
previous one "MINOR: init: move the maxsock calculation code to
compute_ideal_maxsock()" should be backported to 2.1 and 2.0.
Thanks to Baptiste for raising the issue.
2020-03-10 12:54:54 -04:00
if ( global . maxconn > SYSTEM_MAXCONN )
global . maxconn = SYSTEM_MAXCONN ;
2015-01-28 13:03:21 -05:00
# endif /* SYSTEM_MAXCONN */
2015-01-15 15:45:22 -05:00
MEDIUM: init: always try to push the FD limit when maxconn is set from -m
When a maximum memory setting is passed to haproxy and maxconn is not set
and ulimit-n is not set, it is expected that maxconn will be set to the
highest value permitted by this memory setting, possibly affecting the
FD limit.
When maxconn was changed to be deduced from the current process's FD limit,
the automatic setting above was partially lost because it now remains
limited to the current FD limit in addition to being limited to the
memory usage. For unprivileged processes it does not change anything,
but for privileged processes the difference is important. Indeed, the
previous behavior ensured that the new FD limit could be enforced on
the process as long as the user had the privilege to do so. Now this
does not happen anymore, and some people rely on this for automatic
sizing in VM environments.
This patch implements the ability to verify if the setting will be
enforceable on the process or not. First it computes maxconn based on
the memory limits alone, then checks if the process is willing to accept
them, otherwise tries again by respecting the process' hard limit.
Thanks to this we now have the best of the pre-2.0 behavior and the
current one, in that privileged users will be able to get as high a
maxconn as they need just based on the memory limit, while unprivileged
users will still get as high a setting as permitted by the intersection
of the memory limit and the process' FD limit.
Ideally, after some observation period, this patch along with the
previous one "MINOR: init: move the maxsock calculation code to
compute_ideal_maxsock()" should be backported to 2.1 and 2.0.
Thanks to Baptiste for raising the issue.
2020-03-10 12:54:54 -04:00
if ( clearmem < = 0 | | ! global . maxconn ) {
ha_alert ( " Cannot compute the automatic maxconn because global.maxsslconn is already too "
" high for the global.memmax value (%d MB). The absolute maximum possible value "
" is %d, but %d was found. \n " ,
global . rlimit_memmax ,
2017-11-24 10:50:31 -05:00
( int ) ( mem / ( global . ssl_session_max_cost + global . ssl_handshake_max_cost ) ) ,
MEDIUM: init: always try to push the FD limit when maxconn is set from -m
When a maximum memory setting is passed to haproxy and maxconn is not set
and ulimit-n is not set, it is expected that maxconn will be set to the
highest value permitted by this memory setting, possibly affecting the
FD limit.
When maxconn was changed to be deduced from the current process's FD limit,
the automatic setting above was partially lost because it now remains
limited to the current FD limit in addition to being limited to the
memory usage. For unprivileged processes it does not change anything,
but for privileged processes the difference is important. Indeed, the
previous behavior ensured that the new FD limit could be enforced on
the process as long as the user had the privilege to do so. Now this
does not happen anymore, and some people rely on this for automatic
sizing in VM environments.
This patch implements the ability to verify if the setting will be
enforceable on the process or not. First it computes maxconn based on
the memory limits alone, then checks if the process is willing to accept
them, otherwise tries again by respecting the process' hard limit.
Thanks to this we now have the best of the pre-2.0 behavior and the
current one, in that privileged users will be able to get as high a
maxconn as they need just based on the memory limit, while unprivileged
users will still get as high a setting as permitted by the intersection
of the memory limit and the process' FD limit.
Ideally, after some observation period, this patch along with the
previous one "MINOR: init: move the maxsock calculation code to
compute_ideal_maxsock()" should be backported to 2.1 and 2.0.
Thanks to Baptiste for raising the issue.
2020-03-10 12:54:54 -04:00
global . maxsslconn ) ;
exit ( 1 ) ;
}
if ( check_if_maxsock_permitted ( compute_ideal_maxsock ( global . maxconn ) ) )
break ;
} while ( retried + + < 2 ) ;
2015-01-15 15:45:22 -05:00
if ( global . mode & ( MODE_VERBOSE | MODE_DEBUG ) ) {
if ( sides & & global . maxsslconn > sides * global . maxconn ) {
fprintf ( stderr , " Note: global.maxsslconn is forced to %d which causes global.maxconn "
" to be limited to %d. Better reduce global.maxsslconn to get more "
" room for extra connections. \n " , global . maxsslconn , global . maxconn ) ;
}
fprintf ( stderr , " Note: setting global.maxconn to %d \n " , global . maxconn ) ;
}
}
2006-06-25 20:48:02 -04:00
2020-03-10 12:08:53 -04:00
global . maxsock = compute_ideal_maxsock ( global . maxconn ) ;
global . hardmaxconn = global . maxconn ;
2020-06-19 10:20:59 -04:00
if ( ! global . maxpipes )
global . maxpipes = compute_ideal_maxpipes ( ) ;
2006-06-25 20:48:02 -04:00
MEDIUM: connections: Add a way to control the number of idling connections.
As by default we add all keepalive connections to the idle pool, if we run
into a pathological case, where all client don't do keepalive, but the server
does, and haproxy is configured to only reuse "safe" connections, we will
soon find ourself having lots of idling, unusable for new sessions, connections,
while we won't have any file descriptors available to create new connections.
To fix this, add 2 new global settings, "pool_low_ratio" and "pool_high_ratio".
pool-low-fd-ratio is the % of fds we're allowed to use (against the maximum
number of fds available to haproxy) before we stop adding connections to the
idle pool, and destroy them instead. The default is 20. pool-high-fd-ratio is
the % of fds we're allowed to use (against the maximum number of fds available
to haproxy) before we start killing idling connection in the event we have to
create a new outgoing connection, and no reuse is possible. The default is 25.
2019-04-16 13:07:22 -04:00
/* update connection pool thresholds */
global . tune . pool_low_count = ( ( long long ) global . maxsock * global . tune . pool_low_ratio + 99 ) / 100 ;
global . tune . pool_high_count = ( ( long long ) global . maxsock * global . tune . pool_high_ratio + 99 ) / 100 ;
MEDIUM: config: don't enforce a low frontend maxconn value anymore
Historically the default frontend's maxconn used to be quite low (2000),
which was sufficient two decades ago but often proved to be a problem
when users had purposely set the global maxconn value but forgot to set
the frontend's.
There is no point in keeping this arbitrary limit for frontends : when
the global maxconn is lower, it's already too high and when the global
maxconn is much higher, it becomes a limiting factor which causes trouble
in production.
This commit allows the value to be set to zero, which becomes the new
default value, to mean it's not directly limited, or in fact it's set
to the global maxconn. Since this operation used to be performed before
computing a possibly automatic global maxconn based on memory limits,
the calculation of the maxconn value and its propagation to the backends'
fullconn has now moved to a dedicated function, proxy_adjust_all_maxconn(),
which is called once the global maxconn is stabilized.
This comes with two benefits :
1) a configuration missing "maxconn" in the defaults section will not
limit itself to a magically hardcoded value but will scale up to the
global maxconn ;
2) when the global maxconn is not set and memory limits are used instead,
the frontends' maxconn automatically adapts, and the backends' fullconn
as well.
2019-02-27 11:25:52 -05:00
proxy_adjust_all_maxconn ( ) ;
2007-06-03 11:16:49 -04:00
if ( global . tune . maxpollevents < = 0 )
global . tune . maxpollevents = MAX_POLL_EVENTS ;
2021-03-10 05:06:26 -05:00
if ( global . tune . runqueue_depth < = 0 ) {
/* tests on various thread counts from 1 to 64 have shown an
* optimal queue depth following roughly 1 / sqrt ( threads ) .
*/
int s = my_flsl ( global . nbthread ) ;
s + = ( global . nbthread / s ) ; // roughly twice the sqrt.
global . tune . runqueue_depth = RUNQUEUE_DEPTH * 2 / s ;
}
2018-05-24 12:59:04 -04:00
2009-03-21 15:43:57 -04:00
if ( global . tune . recv_enough = = 0 )
global . tune . recv_enough = MIN_RECV_AT_ONCE_ENOUGH ;
2009-08-17 01:23:33 -04:00
if ( global . tune . maxrewrite > = global . tune . bufsize / 2 )
global . tune . maxrewrite = global . tune . bufsize / 2 ;
2021-06-04 12:22:08 -04:00
usermsgs_clr ( NULL ) ;
2006-06-25 20:48:02 -04:00
if ( arg_mode & ( MODE_DEBUG | MODE_FOREGROUND ) ) {
/* command line debug mode inhibits configuration mode */
2017-06-01 11:38:50 -04:00
global . mode & = ~ ( MODE_DAEMON | MODE_QUIET ) ;
2012-10-26 10:04:28 -04:00
global . mode | = ( arg_mode & ( MODE_DEBUG | MODE_FOREGROUND ) ) ;
2006-06-25 20:48:02 -04:00
}
2012-10-26 10:04:28 -04:00
2017-06-01 11:38:50 -04:00
if ( arg_mode & MODE_DAEMON ) {
2012-10-26 10:04:28 -04:00
/* command line daemon mode inhibits foreground and debug modes mode */
global . mode & = ~ ( MODE_DEBUG | MODE_FOREGROUND ) ;
2017-06-01 11:38:50 -04:00
global . mode | = arg_mode & MODE_DAEMON ;
2012-10-26 10:04:28 -04:00
}
global . mode | = ( arg_mode & ( MODE_QUIET | MODE_VERBOSE ) ) ;
2006-06-25 20:48:02 -04:00
2017-06-01 11:38:50 -04:00
if ( ( global . mode & MODE_DEBUG ) & & ( global . mode & ( MODE_DAEMON | MODE_QUIET ) ) ) {
2017-11-24 10:50:31 -05:00
ha_warning ( " <debug> mode incompatible with <quiet> and <daemon>. Keeping <debug> only. \n " ) ;
2017-06-01 11:38:50 -04:00
global . mode & = ~ ( MODE_DAEMON | MODE_QUIET ) ;
2006-06-25 20:48:02 -04:00
}
2017-08-29 10:46:57 -04:00
/* Realloc trash buffers because global.tune.bufsize may have changed */
2017-10-27 07:53:47 -04:00
if ( ! init_trash_buffers ( 0 ) ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " failed to initialize trash buffers. \n " ) ;
2017-08-29 10:46:57 -04:00
exit ( 1 ) ;
}
2017-11-14 16:02:30 -05:00
if ( ! init_log_buffers ( ) ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " failed to initialize log buffers. \n " ) ;
2017-11-14 16:02:30 -05:00
exit ( 1 ) ;
}
2023-09-07 12:43:52 -04:00
if ( ! cluster_secret_isset )
2022-11-14 10:18:46 -05:00
generate_random_cluster_secret ( ) ;
2007-04-15 18:25:25 -04:00
/*
* Note : we could register external pollers here .
* Built - in pollers have been registered before main ( ) .
*/
2007-04-08 10:39:58 -04:00
2009-01-25 09:42:27 -05:00
if ( ! ( global . tune . options & GTUNE_USE_KQUEUE ) )
2007-04-09 06:03:06 -04:00
disable_poller ( " kqueue " ) ;
2019-04-08 12:53:32 -04:00
if ( ! ( global . tune . options & GTUNE_USE_EVPORTS ) )
disable_poller ( " evports " ) ;
2009-01-25 09:42:27 -05:00
if ( ! ( global . tune . options & GTUNE_USE_EPOLL ) )
2007-04-08 10:39:58 -04:00
disable_poller ( " epoll " ) ;
2009-01-25 09:42:27 -05:00
if ( ! ( global . tune . options & GTUNE_USE_POLL ) )
2007-04-08 10:39:58 -04:00
disable_poller ( " poll " ) ;
2009-01-25 09:42:27 -05:00
if ( ! ( global . tune . options & GTUNE_USE_SELECT ) )
2007-04-08 10:39:58 -04:00
disable_poller ( " select " ) ;
/* Note: we could disable any poller by name here */
2016-03-07 06:46:38 -05:00
if ( global . mode & ( MODE_VERBOSE | MODE_DEBUG ) ) {
2007-04-09 13:29:56 -04:00
list_pollers ( stderr ) ;
2016-03-07 06:46:38 -05:00
fprintf ( stderr , " \n " ) ;
list_filters ( stderr ) ;
}
2007-04-09 13:29:56 -04:00
2007-04-08 10:39:58 -04:00
if ( ! init_pollers ( ) ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " No polling mechanism available. \n "
2022-07-09 17:38:46 -04:00
" This may happen when using thread-groups with old pollers (poll/select), or \n "
" it is possible that haproxy was built with TARGET=generic and that FD_SETSIZE \n "
2017-11-24 10:50:31 -05:00
" is too low on this platform to support maxconn and the number of listeners \n "
" and servers. You should rebuild haproxy specifying your system using TARGET= \n "
" in order to support other polling systems (poll, epoll, kqueue) or reduce the \n "
" global maxconn setting to accommodate the system's limitation. For reference, \n "
" FD_SETSIZE=%d on this system, global.maxconn=%d resulting in a maximum of \n "
" %d file descriptors. You should thus reduce global.maxconn by %d. Also, \n "
" check build settings using 'haproxy -vv'. \n \n " ,
FD_SETSIZE , global . maxconn , global . maxsock , ( global . maxsock + 1 - FD_SETSIZE ) / 2 ) ;
2007-04-08 10:39:58 -04:00
exit ( 1 ) ;
}
2007-04-09 13:29:56 -04:00
if ( global . mode & ( MODE_VERBOSE | MODE_DEBUG ) ) {
printf ( " Using %s() as the polling mechanism. \n " , cur_poller . name ) ;
2007-04-08 10:39:58 -04:00
}
2009-10-02 16:51:14 -04:00
if ( ! global . node )
global . node = strdup ( hostname ) ;
2020-10-07 12:36:54 -04:00
/* stop disabled proxies */
for ( px = proxies_list ; px ; px = px - > next ) {
2021-10-06 08:24:19 -04:00
if ( px - > flags & ( PR_FL_DISABLED | PR_FL_STOPPED ) )
2020-10-07 12:36:54 -04:00
stop_proxy ( px ) ;
}
2015-01-23 06:08:30 -05:00
if ( ! hlua_post_init ( ) )
exit ( 1 ) ;
2022-12-19 02:15:57 -05:00
/* Set the per-thread pool cache size to the default value if not set.
* This is the right place to decide to automatically adjust it ( e . g .
* check L2 cache size , thread counts or take into account certain
* expensive pools ) .
*/
if ( ! global . tune . pool_cache_size )
global . tune . pool_cache_size = CONFIG_HAP_POOL_CACHE_SIZE ;
2023-11-23 05:32:24 -05:00
/* fill in a few info about our version and build options */
chunk_reset ( & trash ) ;
/* toolchain */
cc = chunk_newstr ( & trash ) ;
# if defined(__clang_version__)
chunk_appendf ( & trash , " clang- " __clang_version__ ) ;
# elif defined(__VERSION__)
chunk_appendf ( & trash , " gcc- " __VERSION__ ) ;
# endif
# if __has_feature(address_sanitizer) || defined(__SANITIZE_ADDRESS__)
chunk_appendf ( & trash , " +asan " ) ;
# endif
/* toolchain opts */
cflags = chunk_newstr ( & trash ) ;
# ifdef BUILD_CC
chunk_appendf ( & trash , " %s " , BUILD_CC ) ;
# endif
# ifdef BUILD_CFLAGS
chunk_appendf ( & trash , " %s " , BUILD_CFLAGS ) ;
# endif
# ifdef BUILD_DEBUG
chunk_appendf ( & trash , " %s " , BUILD_DEBUG ) ;
# endif
/* settings */
opts = chunk_newstr ( & trash ) ;
# ifdef BUILD_TARGET
chunk_appendf ( & trash , " TARGET='%s' " , BUILD_TARGET ) ;
# endif
# ifdef BUILD_OPTIONS
chunk_appendf ( & trash , " %s " , BUILD_OPTIONS ) ;
# endif
post_mortem_add_component ( " haproxy " , haproxy_version , cc , cflags , opts , argv [ 0 ] ) ;
2006-06-25 20:48:02 -04:00
}
2017-03-23 17:44:13 -04:00
void deinit ( void )
2006-06-25 20:48:02 -04:00
{
2017-11-24 10:54:05 -05:00
struct proxy * p = proxies_list , * p0 ;
2024-08-07 12:20:43 -04:00
struct cfgfile * cfg , * cfg_tmp ;
2008-05-31 07:53:23 -04:00
struct uri_auth * uap , * ua = NULL ;
MEDIUM: tree-wide: logsrv struct becomes logger
When 'log' directive was implemented, the internal representation was
named 'struct logsrv', because the 'log' directive would directly point
to the log target, which used to be a (UDP) log server exclusively at
that time, hence the name.
But things have become more complex, since today 'log' directive can point
to ring targets (implicit, or named) for example.
Indeed, a 'log' directive does no longer reference the "final" server to
which the log will be sent, but instead it describes which log API and
parameters to use for transporting the log messages to the proper log
destination.
So now the term 'logsrv' is rather confusing and prevents us from
introducing a new level of abstraction because they would be mixed
with logsrv.
So in order to better designate this 'log' directive, and make it more
generic, we chose the word 'logger' which now replaces logsrv everywhere
it was used in the code (including related comments).
This is internal rewording, so no functional change should be expected
on user-side.
2023-09-11 09:06:53 -04:00
struct logger * log , * logb ;
2016-12-21 12:43:10 -05:00
struct build_opts_str * bol , * bolb ;
2020-07-04 05:49:48 -04:00
struct post_deinit_fct * pdf , * pdfb ;
2020-07-04 05:49:47 -04:00
struct proxy_deinit_fct * pxdf , * pxdfb ;
2020-07-04 05:49:49 -04:00
struct server_deinit_fct * srvdf , * srvdfb ;
2020-09-10 13:46:41 -04:00
struct per_thread_init_fct * tif , * tifb ;
struct per_thread_deinit_fct * tdf , * tdfb ;
struct per_thread_alloc_fct * taf , * tafb ;
struct per_thread_free_fct * tff , * tffb ;
2020-07-04 05:49:50 -04:00
struct post_server_check_fct * pscf , * pscfb ;
2020-09-10 13:46:42 -04:00
struct post_check_fct * pcf , * pcfb ;
2020-09-10 13:46:40 -04:00
struct post_proxy_check_fct * ppcf , * ppcfb ;
2022-04-27 12:02:54 -04:00
struct pre_check_fct * prcf , * prcfb ;
2022-04-27 12:07:24 -04:00
struct cfg_postparser * pprs , * pprsb ;
2020-09-23 10:46:22 -04:00
int cur_fd ;
2022-11-15 03:34:07 -05:00
/* the user may want to skip this phase */
if ( global . tune . options & GTUNE_QUICK_EXIT )
return ;
2020-09-23 10:46:22 -04:00
/* At this point the listeners state is weird:
* - most listeners are still bound and referenced in their protocol
* - some might be zombies that are not in their proto anymore , but
* still appear in their proxy ' s listeners with a valid FD .
* - some might be stopped and still appear in their proxy as FD # - 1
* - among all of them , some might be inherited hence shared and we ' re
* not allowed to pause them or whatever , we must just close them .
* - finally some are not listeners ( pipes , logs , stdout , etc ) and
* must be left intact .
*
* The safe way to proceed is to unbind ( and close ) whatever is not yet
* unbound so that no more receiver / listener remains alive . Then close
* remaining listener FDs , which correspond to zombie listeners ( those
* belonging to disabled proxies that were in another process ) .
* objt_listener ( ) would be cleaner here but not converted yet .
*/
protocol_unbind_all ( ) ;
for ( cur_fd = 0 ; cur_fd < global . maxsock ; cur_fd + + ) {
2020-10-14 06:13:51 -04:00
if ( ! fdtab | | ! fdtab [ cur_fd ] . owner )
2020-09-23 10:46:22 -04:00
continue ;
2020-10-15 15:29:49 -04:00
if ( fdtab [ cur_fd ] . iocb = = & sock_accept_iocb ) {
2020-09-23 10:46:22 -04:00
struct listener * l = fdtab [ cur_fd ] . owner ;
BUG_ON ( l - > state ! = LI_INIT ) ;
unbind_listener ( l ) ;
}
}
2008-05-31 07:53:23 -04:00
2010-08-27 11:56:48 -04:00
deinit_signals ( ) ;
2006-06-25 20:48:02 -04:00
while ( p ) {
2008-05-31 07:53:23 -04:00
/* build a list of unique uri_auths */
if ( ! ua )
ua = p - > uri_auth ;
else {
/* check if p->uri_auth is unique */
for ( uap = ua ; uap ; uap = uap - > next )
if ( uap = = p - > uri_auth )
break ;
2008-06-24 05:14:45 -04:00
if ( ! uap & & p - > uri_auth ) {
2008-05-31 07:53:23 -04:00
/* add it, if it is */
p - > uri_auth - > next = ua ;
ua = p - > uri_auth ;
}
}
2007-06-16 18:36:03 -04:00
2007-05-13 18:39:29 -04:00
p0 = p ;
2006-06-25 20:48:02 -04:00
p = p - > next ;
2021-03-24 11:13:20 -04:00
free_proxy ( p0 ) ;
2006-06-25 20:48:02 -04:00
} /* end while(p) */
2007-10-16 06:25:14 -04:00
2023-07-03 12:07:30 -04:00
/* we don't need to free sink_proxies_list nor cfg_log_forward proxies since
* they are respectively cleaned up in sink_deinit ( ) and deinit_log_forward ( )
2023-03-09 06:07:09 -05:00
*/
2021-10-13 03:50:53 -04:00
/* destroy all referenced defaults proxies */
proxy_destroy_all_unref_defaults ( ) ;
2008-05-31 07:53:23 -04:00
while ( ua ) {
2020-09-10 13:46:38 -04:00
struct stat_scope * scope , * scopep ;
2008-05-31 07:53:23 -04:00
uap = ua ;
ua = ua - > next ;
2008-08-03 06:19:50 -04:00
free ( uap - > uri_prefix ) ;
free ( uap - > auth_realm ) ;
2009-10-02 16:51:14 -04:00
free ( uap - > node ) ;
free ( uap - > desc ) ;
2008-05-31 07:53:23 -04:00
2010-01-29 13:29:32 -05:00
userlist_free ( uap - > userlist ) ;
2021-03-25 12:15:52 -04:00
free_act_rules ( & uap - > http_req_rules ) ;
2010-01-29 13:29:32 -05:00
2020-09-10 13:46:38 -04:00
scope = uap - > scope ;
while ( scope ) {
scopep = scope ;
scope = scope - > next ;
free ( scopep - > px_id ) ;
free ( scopep ) ;
}
2008-05-31 07:53:23 -04:00
free ( uap ) ;
}
2010-01-29 11:50:44 -05:00
userlist_free ( userlist ) ;
2015-09-25 07:02:25 -04:00
cfg_unregister_sections ( ) ;
2017-07-26 09:33:35 -04:00
deinit_log_buffers ( ) ;
2015-09-25 07:02:25 -04:00
2016-12-21 14:46:26 -05:00
list_for_each_entry ( pdf , & post_deinit_list , list )
pdf - > fct ( ) ;
2021-02-20 04:46:51 -05:00
ha_free ( & global . log_send_hostname ) ;
2015-10-01 07:18:13 -04:00
chunk_destroy ( & global . log_tag ) ;
2021-02-20 04:46:51 -05:00
ha_free ( & global . chroot ) ;
ha_free ( & global . pidfile ) ;
ha_free ( & global . node ) ;
ha_free ( & global . desc ) ;
ha_free ( & oldpids ) ;
ha_free ( & old_argv ) ;
ha_free ( & localpeer ) ;
ha_free ( & global . server_state_base ) ;
ha_free ( & global . server_state_file ) ;
2024-04-24 05:09:06 -04:00
ha_free ( & global . stats_file ) ;
2019-04-17 16:51:06 -04:00
task_destroy ( idle_conn_task ) ;
2019-02-14 12:29:09 -05:00
idle_conn_task = NULL ;
2008-05-31 07:53:23 -04:00
MEDIUM: tree-wide: logsrv struct becomes logger
When 'log' directive was implemented, the internal representation was
named 'struct logsrv', because the 'log' directive would directly point
to the log target, which used to be a (UDP) log server exclusively at
that time, hence the name.
But things have become more complex, since today 'log' directive can point
to ring targets (implicit, or named) for example.
Indeed, a 'log' directive does no longer reference the "final" server to
which the log will be sent, but instead it describes which log API and
parameters to use for transporting the log messages to the proper log
destination.
So now the term 'logsrv' is rather confusing and prevents us from
introducing a new level of abstraction because they would be mixed
with logsrv.
So in order to better designate this 'log' directive, and make it more
generic, we chose the word 'logger' which now replaces logsrv everywhere
it was used in the code (including related comments).
This is internal rewording, so no functional change should be expected
on user-side.
2023-09-11 09:06:53 -04:00
list_for_each_entry_safe ( log , logb , & global . loggers , list ) {
2023-01-26 09:32:12 -05:00
LIST_DEL_INIT ( & log - > list ) ;
MEDIUM: tree-wide: logsrv struct becomes logger
When 'log' directive was implemented, the internal representation was
named 'struct logsrv', because the 'log' directive would directly point
to the log target, which used to be a (UDP) log server exclusively at
that time, hence the name.
But things have become more complex, since today 'log' directive can point
to ring targets (implicit, or named) for example.
Indeed, a 'log' directive does no longer reference the "final" server to
which the log will be sent, but instead it describes which log API and
parameters to use for transporting the log messages to the proper log
destination.
So now the term 'logsrv' is rather confusing and prevents us from
introducing a new level of abstraction because they would be mixed
with logsrv.
So in order to better designate this 'log' directive, and make it more
generic, we chose the word 'logger' which now replaces logsrv everywhere
it was used in the code (including related comments).
This is internal rewording, so no functional change should be expected
on user-side.
2023-09-11 09:06:53 -04:00
free_logger ( log ) ;
2023-01-26 09:32:12 -05:00
}
2024-08-07 12:20:43 -04:00
list_for_each_entry_safe ( cfg , cfg_tmp , & cfg_cfgfiles , list ) {
ha_free ( & cfg - > filename ) ;
LIST_DELETE ( & cfg - > list ) ;
ha_free ( & cfg ) ;
2010-01-03 15:12:30 -05:00
}
2016-12-21 12:43:10 -05:00
list_for_each_entry_safe ( bol , bolb , & build_opts_list , list ) {
if ( bol - > must_free )
free ( ( void * ) bol - > str ) ;
2021-04-21 01:32:39 -04:00
LIST_DELETE ( & bol - > list ) ;
2016-12-21 12:43:10 -05:00
free ( bol ) ;
}
2020-07-04 05:49:47 -04:00
list_for_each_entry_safe ( pxdf , pxdfb , & proxy_deinit_list , list ) {
2021-04-21 01:32:39 -04:00
LIST_DELETE ( & pxdf - > list ) ;
2020-07-04 05:49:47 -04:00
free ( pxdf ) ;
}
2020-07-04 05:49:48 -04:00
list_for_each_entry_safe ( pdf , pdfb , & post_deinit_list , list ) {
2021-04-21 01:32:39 -04:00
LIST_DELETE ( & pdf - > list ) ;
2020-07-04 05:49:48 -04:00
free ( pdf ) ;
}
2020-07-04 05:49:49 -04:00
list_for_each_entry_safe ( srvdf , srvdfb , & server_deinit_list , list ) {
2021-04-21 01:32:39 -04:00
LIST_DELETE ( & srvdf - > list ) ;
2020-07-04 05:49:49 -04:00
free ( srvdf ) ;
}
2020-09-10 13:46:42 -04:00
list_for_each_entry_safe ( pcf , pcfb , & post_check_list , list ) {
2021-04-21 01:32:39 -04:00
LIST_DELETE ( & pcf - > list ) ;
2020-09-10 13:46:42 -04:00
free ( pcf ) ;
}
2020-07-04 05:49:50 -04:00
list_for_each_entry_safe ( pscf , pscfb , & post_server_check_list , list ) {
2021-04-21 01:32:39 -04:00
LIST_DELETE ( & pscf - > list ) ;
2020-07-04 05:49:50 -04:00
free ( pscf ) ;
}
2020-09-10 13:46:40 -04:00
list_for_each_entry_safe ( ppcf , ppcfb , & post_proxy_check_list , list ) {
2021-04-21 01:32:39 -04:00
LIST_DELETE ( & ppcf - > list ) ;
2020-09-10 13:46:40 -04:00
free ( ppcf ) ;
}
2022-04-27 12:02:54 -04:00
list_for_each_entry_safe ( prcf , prcfb , & pre_check_list , list ) {
LIST_DELETE ( & prcf - > list ) ;
free ( prcf ) ;
}
2020-09-10 13:46:41 -04:00
list_for_each_entry_safe ( tif , tifb , & per_thread_init_list , list ) {
2021-04-21 01:32:39 -04:00
LIST_DELETE ( & tif - > list ) ;
2020-09-10 13:46:41 -04:00
free ( tif ) ;
}
list_for_each_entry_safe ( tdf , tdfb , & per_thread_deinit_list , list ) {
2021-04-21 01:32:39 -04:00
LIST_DELETE ( & tdf - > list ) ;
2020-09-10 13:46:41 -04:00
free ( tdf ) ;
}
list_for_each_entry_safe ( taf , tafb , & per_thread_alloc_list , list ) {
2021-04-21 01:32:39 -04:00
LIST_DELETE ( & taf - > list ) ;
2020-09-10 13:46:41 -04:00
free ( taf ) ;
}
list_for_each_entry_safe ( tff , tffb , & per_thread_free_list , list ) {
2021-04-21 01:32:39 -04:00
LIST_DELETE ( & tff - > list ) ;
2020-09-10 13:46:41 -04:00
free ( tff ) ;
}
2022-04-27 12:07:24 -04:00
list_for_each_entry_safe ( pprs , pprsb , & postparsers , list ) {
LIST_DELETE ( & pprs - > list ) ;
free ( pprs ) ;
}
2021-05-08 05:41:28 -04:00
vars_prune ( & proc_vars , NULL , NULL ) ;
2018-11-26 09:57:34 -05:00
pool_destroy_all ( ) ;
[MEDIUM] Fix memory freeing at exit
New functions implemented:
- deinit_pollers: called at the end of deinit())
- prune_acl: called via list_for_each_entry_safe
Add missing pool_destroy2 calls:
- p->hdr_idx_pool
- pool2_tree64
Implement all task stopping:
- health-check: needs new "struct task" in the struct server
- queue processing: queue_mgt
- appsess_refresh: appsession_refresh
before (idle system):
==6079== LEAK SUMMARY:
==6079== definitely lost: 1,112 bytes in 75 blocks.
==6079== indirectly lost: 53,356 bytes in 2,090 blocks.
==6079== possibly lost: 52 bytes in 1 blocks.
==6079== still reachable: 150,996 bytes in 504 blocks.
==6079== suppressed: 0 bytes in 0 blocks.
after (idle system):
==6945== LEAK SUMMARY:
==6945== definitely lost: 7,644 bytes in 137 blocks.
==6945== indirectly lost: 9,913 bytes in 587 blocks.
==6945== possibly lost: 0 bytes in 0 blocks.
==6945== still reachable: 0 bytes in 0 blocks.
==6945== suppressed: 0 bytes in 0 blocks.
before (running system for ~2m):
==9343== LEAK SUMMARY:
==9343== definitely lost: 1,112 bytes in 75 blocks.
==9343== indirectly lost: 54,199 bytes in 2,122 blocks.
==9343== possibly lost: 52 bytes in 1 blocks.
==9343== still reachable: 151,128 bytes in 509 blocks.
==9343== suppressed: 0 bytes in 0 blocks.
after (running system for ~2m):
==11616== LEAK SUMMARY:
==11616== definitely lost: 7,644 bytes in 137 blocks.
==11616== indirectly lost: 9,981 bytes in 591 blocks.
==11616== possibly lost: 0 bytes in 0 blocks.
==11616== still reachable: 4 bytes in 1 blocks.
==11616== suppressed: 0 bytes in 0 blocks.
Still not perfect but significant improvement.
2008-05-29 17:53:44 -04:00
deinit_pollers ( ) ;
2006-06-25 20:48:02 -04:00
} /* end deinit() */
2020-06-15 12:43:46 -04:00
__attribute__ ( ( noreturn ) ) void deinit_and_exit ( int status )
2020-06-13 18:37:41 -04:00
{
2021-08-09 09:02:56 -04:00
global . mode | = MODE_STOPPING ;
2020-06-13 18:37:41 -04:00
deinit ( ) ;
exit ( status ) ;
}
2018-11-06 11:37:16 -05:00
2011-07-25 10:33:49 -04:00
/* Runs the polling loop */
2020-03-03 08:59:56 -05:00
void run_poll_loop ( )
2007-04-08 10:39:58 -04:00
{
2019-05-28 10:44:05 -04:00
int next , wake ;
2007-04-08 10:39:58 -04:00
2023-02-17 02:36:42 -05:00
_HA_ATOMIC_OR ( & th_ctx - > flags , TH_FL_IN_LOOP ) ;
2021-10-08 03:33:24 -04:00
clock_update_date ( 0 , 1 ) ;
2007-04-08 10:39:58 -04:00
while ( 1 ) {
MINOR: tasks: split wake_expired_tasks() in two parts to avoid useless wakeups
We used to have wake_expired_tasks() wake up tasks and return the next
expiration delay. The problem this causes is that we have to call it just
before poll() in order to consider latest timers, but this also means that
we don't wake up all newly expired tasks upon return from poll(), which
thus systematically requires a second poll() round.
This is visible when running any scheduled task like a health check, as there
are systematically two poll() calls, one with the interval, nothing is done
after it, and another one with a zero delay, and the task is called:
listen test
bind *:8001
server s1 127.0.0.1:1111 check
09:37:38.200959 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8696843}) = 0
09:37:38.200967 epoll_wait(3, [], 200, 1000) = 0
09:37:39.202459 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8712467}) = 0
>> nothing run here, as the expired task was not woken up yet.
09:37:39.202497 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8715766}) = 0
09:37:39.202505 epoll_wait(3, [], 200, 0) = 0
09:37:39.202513 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8719064}) = 0
>> now the expired task was woken up
09:37:39.202522 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 7
09:37:39.202537 fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
09:37:39.202565 setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0
09:37:39.202577 setsockopt(7, SOL_TCP, TCP_QUICKACK, [0], 4) = 0
09:37:39.202585 connect(7, {sa_family=AF_INET, sin_port=htons(1111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
09:37:39.202659 epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLOUT, {u32=7, u64=7}}) = 0
09:37:39.202673 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8814713}) = 0
09:37:39.202683 epoll_wait(3, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=7, u64=7}}], 200, 1000) = 1
09:37:39.202693 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8818617}) = 0
09:37:39.202701 getsockopt(7, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
09:37:39.202715 close(7) = 0
Let's instead split the function in two parts:
- the first part, wake_expired_tasks(), called just before
process_runnable_tasks(), wakes up all expired tasks; it doesn't
compute any timeout.
- the second part, next_timer_expiry(), called just before poll(),
only computes the next timeout for the current thread.
Thanks to this, all expired tasks are properly woken up when leaving
poll, and each poll call's timeout remains up to date:
09:41:16.270449 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10223556}) = 0
09:41:16.270457 epoll_wait(3, [], 200, 999) = 0
09:41:17.270130 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10238572}) = 0
09:41:17.270157 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 7
09:41:17.270194 fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
09:41:17.270204 setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0
09:41:17.270216 setsockopt(7, SOL_TCP, TCP_QUICKACK, [0], 4) = 0
09:41:17.270224 connect(7, {sa_family=AF_INET, sin_port=htons(1111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
09:41:17.270299 epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLOUT, {u32=7, u64=7}}) = 0
09:41:17.270314 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10337841}) = 0
09:41:17.270323 epoll_wait(3, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=7, u64=7}}], 200, 1000) = 1
09:41:17.270332 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10341860}) = 0
09:41:17.270340 getsockopt(7, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
09:41:17.270367 close(7) = 0
This may be backported to 2.1 and 2.0 though it's unlikely to bring any
user-visible improvement except to clarify debugging.
2019-12-11 02:12:23 -05:00
wake_expired_tasks ( ) ;
2018-06-07 03:46:01 -04:00
/* check if we caught some signals and process them in the
first thread */
2020-06-19 06:06:34 -04:00
if ( signal_queue_len & & tid = = 0 ) {
activity [ tid ] . wake_signal + + ;
2018-06-07 03:46:01 -04:00
signal_process_queue ( ) ;
2020-06-19 06:06:34 -04:00
}
/* Process a few tasks */
process_runnable_tasks ( ) ;
2009-05-10 03:01:21 -04:00
2019-06-02 05:11:29 -04:00
/* also stop if we failed to cleanly stop all tasks */
if ( killed > 1 )
break ;
2022-09-09 04:21:00 -04:00
/* expire immediately if events or signals are pending */
2019-05-28 10:44:05 -04:00
wake = 1 ;
2019-07-24 12:07:06 -04:00
if ( thread_has_tasks ( ) )
2018-01-20 13:30:13 -05:00
activity [ tid ] . wake_tasks + + ;
2018-07-26 11:55:11 -04:00
else {
2022-06-20 03:23:24 -04:00
_HA_ATOMIC_OR ( & th_ctx - > flags , TH_FL_SLEEPING ) ;
2022-06-22 09:38:38 -04:00
_HA_ATOMIC_AND ( & th_ctx - > flags , ~ TH_FL_NOTIFIED ) ;
2019-03-08 12:51:17 -05:00
__ha_barrier_atomic_store ( ) ;
2020-03-23 04:33:32 -04:00
if ( thread_has_tasks ( ) ) {
2018-07-26 11:55:11 -04:00
activity [ tid ] . wake_tasks + + ;
2022-06-20 03:23:24 -04:00
_HA_ATOMIC_AND ( & th_ctx - > flags , ~ TH_FL_SLEEPING ) ;
2024-05-06 08:24:41 -04:00
} else if ( signal_queue_len & & tid = = 0 ) {
2022-09-09 04:21:00 -04:00
/* this check is required after setting TH_FL_SLEEPING to avoid
* a race with wakeup on signals using wake_threads ( ) */
_HA_ATOMIC_AND ( & th_ctx - > flags , ~ TH_FL_SLEEPING ) ;
2018-07-26 11:55:11 -04:00
} else
2019-05-28 10:44:05 -04:00
wake = 0 ;
2018-07-26 11:55:11 -04:00
}
2015-04-13 14:44:19 -04:00
2020-03-23 04:27:28 -04:00
if ( ! wake ) {
BUG/MINOR: soft-stop: always wake up waiting threads on stopping
Currently the soft-stop can lead to old processes remaining alive for as
long as two seconds after receiving a soft-stop signal. What happens is
that when receiving SIGUSR1, one thread (usually the first one) wakes up,
handles the signal, sets "stopping", goes into runn_poll_loop(), and
discovers that stopping is set, so its also sets itself in the
stopping_thread_mask bit mask. After this it sees that other threads are
not yet willing to stop, so it continues to wait.
From there, other threads which were waiting in poll() expire after one
second on poll timeout and enter run_poll_loop() in turn. That's already
one second of wait time. They discover each in turn that they're stopping
and see that other threads are not yet stopping, so they go back waiting.
After the end of the first second, all threads know they're stopping and
have set their bit in stopping_thread_mask. It's only now that those who
started to wait first wake up again on timeout to discover that all other
ones are stopping, and can now quit. One second later all threads will
have done it and the process will quit.
This is effectively strictly larger than one second and up to two seconds.
What the current patch does is simple, when the first thread stops, it sets
its own bit into stopping_thread_mask then wakes up all other threads to do
also set theirs. This kills the first second which corresponds to the time
to discover the stopping state. Second, when a thread exists, it wakes all
other ones again because some might have gone back sleeping waiting for
"jobs" to go down to zero (i.e. closing the last connection). This kills
the last second of wait time.
Thanks to this, as SIGUSR1 now acts instantly again if there's no active
connection, or it stops immediately after the last connection has left if
one was still present.
This should be backported as far as 2.0.
2020-05-13 07:51:01 -04:00
int i ;
if ( stopping ) {
2023-03-08 04:37:45 -05:00
/* stop muxes/quic-conns before acknowledging stopping */
2022-07-04 08:07:29 -04:00
if ( ! ( tg_ctx - > stopping_threads & ti - > ltid_bit ) ) {
2021-05-03 04:47:51 -04:00
task_wakeup ( mux_stopping_data [ tid ] . task , TASK_WOKEN_OTHER ) ;
wake = 1 ;
}
2022-06-28 13:29:29 -04:00
if ( _HA_ATOMIC_OR_FETCH ( & tg_ctx - > stopping_threads , ti - > ltid_bit ) = = ti - > ltid_bit & &
_HA_ATOMIC_OR_FETCH ( & stopping_tgroup_mask , tg - > tgid_bit ) = = tg - > tgid_bit ) {
/* first one to detect it, notify all threads that stopping was just set */
for ( i = 0 ; i < global . nbthread ; i + + ) {
2023-01-19 13:14:18 -05:00
if ( _HA_ATOMIC_LOAD ( & ha_thread_info [ i ] . tg - > threads_enabled ) &
2022-06-28 13:29:29 -04:00
ha_thread_info [ i ] . ltid_bit &
~ _HA_ATOMIC_LOAD ( & ha_thread_info [ i ] . tg_ctx - > stopping_threads ) )
2020-05-13 08:30:25 -04:00
wake_thread ( i ) ;
2022-06-28 13:29:29 -04:00
}
2020-05-13 08:30:25 -04:00
}
BUG/MINOR: soft-stop: always wake up waiting threads on stopping
Currently the soft-stop can lead to old processes remaining alive for as
long as two seconds after receiving a soft-stop signal. What happens is
that when receiving SIGUSR1, one thread (usually the first one) wakes up,
handles the signal, sets "stopping", goes into runn_poll_loop(), and
discovers that stopping is set, so its also sets itself in the
stopping_thread_mask bit mask. After this it sees that other threads are
not yet willing to stop, so it continues to wait.
From there, other threads which were waiting in poll() expire after one
second on poll timeout and enter run_poll_loop() in turn. That's already
one second of wait time. They discover each in turn that they're stopping
and see that other threads are not yet stopping, so they go back waiting.
After the end of the first second, all threads know they're stopping and
have set their bit in stopping_thread_mask. It's only now that those who
started to wait first wake up again on timeout to discover that all other
ones are stopping, and can now quit. One second later all threads will
have done it and the process will quit.
This is effectively strictly larger than one second and up to two seconds.
What the current patch does is simple, when the first thread stops, it sets
its own bit into stopping_thread_mask then wakes up all other threads to do
also set theirs. This kills the first second which corresponds to the time
to discover the stopping state. Second, when a thread exists, it wakes all
other ones again because some might have gone back sleeping waiting for
"jobs" to go down to zero (i.e. closing the last connection). This kills
the last second of wait time.
Thanks to this, as SIGUSR1 now acts instantly again if there's no active
connection, or it stops immediately after the last connection has left if
one was still present.
This should be backported as far as 2.0.
2020-05-13 07:51:01 -04:00
}
2020-03-23 04:27:28 -04:00
/* stop when there's nothing left to do */
if ( ( jobs - unstoppable_jobs ) = = 0 & &
2022-06-28 13:29:29 -04:00
( _HA_ATOMIC_LOAD ( & stopping_tgroup_mask ) & all_tgroups_mask ) = = all_tgroups_mask ) {
/* check that all threads are aware of the stopping status */
for ( i = 0 ; i < global . nbtgroups ; i + + )
2023-01-19 13:14:18 -05:00
if ( ( _HA_ATOMIC_LOAD ( & ha_tgroup_ctx [ i ] . stopping_threads ) &
_HA_ATOMIC_LOAD ( & ha_tgroup_info [ i ] . threads_enabled ) ) ! =
_HA_ATOMIC_LOAD ( & ha_tgroup_info [ i ] . threads_enabled ) )
2022-06-28 13:29:29 -04:00
break ;
# ifdef USE_THREAD
if ( i = = global . nbtgroups ) {
/* all are OK, let's wake them all and stop */
for ( i = 0 ; i < global . nbthread ; i + + )
2023-01-19 13:14:18 -05:00
if ( i ! = tid & & _HA_ATOMIC_LOAD ( & ha_thread_info [ i ] . tg - > threads_enabled ) & ha_thread_info [ i ] . ltid_bit )
2022-06-28 13:29:29 -04:00
wake_thread ( i ) ;
break ;
}
# endif
BUG/MINOR: soft-stop: always wake up waiting threads on stopping
Currently the soft-stop can lead to old processes remaining alive for as
long as two seconds after receiving a soft-stop signal. What happens is
that when receiving SIGUSR1, one thread (usually the first one) wakes up,
handles the signal, sets "stopping", goes into runn_poll_loop(), and
discovers that stopping is set, so its also sets itself in the
stopping_thread_mask bit mask. After this it sees that other threads are
not yet willing to stop, so it continues to wait.
From there, other threads which were waiting in poll() expire after one
second on poll timeout and enter run_poll_loop() in turn. That's already
one second of wait time. They discover each in turn that they're stopping
and see that other threads are not yet stopping, so they go back waiting.
After the end of the first second, all threads know they're stopping and
have set their bit in stopping_thread_mask. It's only now that those who
started to wait first wake up again on timeout to discover that all other
ones are stopping, and can now quit. One second later all threads will
have done it and the process will quit.
This is effectively strictly larger than one second and up to two seconds.
What the current patch does is simple, when the first thread stops, it sets
its own bit into stopping_thread_mask then wakes up all other threads to do
also set theirs. This kills the first second which corresponds to the time
to discover the stopping state. Second, when a thread exists, it wakes all
other ones again because some might have gone back sleeping waiting for
"jobs" to go down to zero (i.e. closing the last connection). This kills
the last second of wait time.
Thanks to this, as SIGUSR1 now acts instantly again if there's no active
connection, or it stops immediately after the last connection has left if
one was still present.
This should be backported as far as 2.0.
2020-05-13 07:51:01 -04:00
}
2020-03-23 04:27:28 -04:00
}
MINOR: tasks: split wake_expired_tasks() in two parts to avoid useless wakeups
We used to have wake_expired_tasks() wake up tasks and return the next
expiration delay. The problem this causes is that we have to call it just
before poll() in order to consider latest timers, but this also means that
we don't wake up all newly expired tasks upon return from poll(), which
thus systematically requires a second poll() round.
This is visible when running any scheduled task like a health check, as there
are systematically two poll() calls, one with the interval, nothing is done
after it, and another one with a zero delay, and the task is called:
listen test
bind *:8001
server s1 127.0.0.1:1111 check
09:37:38.200959 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8696843}) = 0
09:37:38.200967 epoll_wait(3, [], 200, 1000) = 0
09:37:39.202459 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8712467}) = 0
>> nothing run here, as the expired task was not woken up yet.
09:37:39.202497 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8715766}) = 0
09:37:39.202505 epoll_wait(3, [], 200, 0) = 0
09:37:39.202513 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8719064}) = 0
>> now the expired task was woken up
09:37:39.202522 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 7
09:37:39.202537 fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
09:37:39.202565 setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0
09:37:39.202577 setsockopt(7, SOL_TCP, TCP_QUICKACK, [0], 4) = 0
09:37:39.202585 connect(7, {sa_family=AF_INET, sin_port=htons(1111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
09:37:39.202659 epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLOUT, {u32=7, u64=7}}) = 0
09:37:39.202673 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8814713}) = 0
09:37:39.202683 epoll_wait(3, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=7, u64=7}}], 200, 1000) = 1
09:37:39.202693 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8818617}) = 0
09:37:39.202701 getsockopt(7, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
09:37:39.202715 close(7) = 0
Let's instead split the function in two parts:
- the first part, wake_expired_tasks(), called just before
process_runnable_tasks(), wakes up all expired tasks; it doesn't
compute any timeout.
- the second part, next_timer_expiry(), called just before poll(),
only computes the next timeout for the current thread.
Thanks to this, all expired tasks are properly woken up when leaving
poll, and each poll call's timeout remains up to date:
09:41:16.270449 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10223556}) = 0
09:41:16.270457 epoll_wait(3, [], 200, 999) = 0
09:41:17.270130 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10238572}) = 0
09:41:17.270157 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 7
09:41:17.270194 fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
09:41:17.270204 setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0
09:41:17.270216 setsockopt(7, SOL_TCP, TCP_QUICKACK, [0], 4) = 0
09:41:17.270224 connect(7, {sa_family=AF_INET, sin_port=htons(1111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
09:41:17.270299 epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLOUT, {u32=7, u64=7}}) = 0
09:41:17.270314 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10337841}) = 0
09:41:17.270323 epoll_wait(3, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=7, u64=7}}], 200, 1000) = 1
09:41:17.270332 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10341860}) = 0
09:41:17.270340 getsockopt(7, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
09:41:17.270367 close(7) = 0
This may be backported to 2.1 and 2.0 though it's unlikely to bring any
user-visible improvement except to clarify debugging.
2019-12-11 02:12:23 -05:00
/* If we have to sleep, measure how long */
next = wake ? TICK_ETERNITY : next_timer_expiry ( ) ;
2008-06-29 16:40:23 -04:00
/* The poller will ensure it returns around <next> */
2019-05-28 10:44:05 -04:00
cur_poller . poll ( & cur_poller , next , wake ) ;
2017-10-03 08:46:45 -04:00
2018-01-20 13:30:13 -05:00
activity [ tid ] . loops + + ;
2007-04-08 10:39:58 -04:00
}
2023-02-17 02:36:42 -05:00
_HA_ATOMIC_AND ( & th_ctx - > flags , ~ TH_FL_IN_LOOP ) ;
2007-04-08 10:39:58 -04:00
}
MAJOR: threads: Start threads to experiment multithreading
[WARNING] For now, HAProxy is not thread-safe, so from this commit, it will be
broken for a while, when compiled with threads.
When nbthread parameter is greater than 1, HAProxy will create the corresponding
number of threads. If nbthread is set to 1, nothing should be done. So if there
are concurrency issues (and be sure there will be, unfortunatly), an obvious
workaround is to disable the multithreading...
Each created threads will run a polling loop. So, in a certain way, it is pretty
similar to the nbproc mode ("outside" the bugs and the lock
contention). Nevertheless, there are an init and a deinit steps for each thread
to deal with per-thread allocation.
Each thread has a tid (thread-id), numbered from 0 to (nbtread-1). It is used in
many place to do bitwise operations or to improve debugging information.
2017-08-29 09:38:48 -04:00
static void * run_thread_poll_loop ( void * data )
{
2019-05-22 08:42:12 -04:00
struct per_thread_alloc_fct * ptaf ;
MAJOR: threads: Start threads to experiment multithreading
[WARNING] For now, HAProxy is not thread-safe, so from this commit, it will be
broken for a while, when compiled with threads.
When nbthread parameter is greater than 1, HAProxy will create the corresponding
number of threads. If nbthread is set to 1, nothing should be done. So if there
are concurrency issues (and be sure there will be, unfortunatly), an obvious
workaround is to disable the multithreading...
Each created threads will run a polling loop. So, in a certain way, it is pretty
similar to the nbproc mode ("outside" the bugs and the lock
contention). Nevertheless, there are an init and a deinit steps for each thread
to deal with per-thread allocation.
Each thread has a tid (thread-id), numbered from 0 to (nbtread-1). It is used in
many place to do bitwise operations or to improve debugging information.
2017-08-29 09:38:48 -04:00
struct per_thread_init_fct * ptif ;
struct per_thread_deinit_fct * ptdf ;
2019-05-22 08:42:12 -04:00
struct per_thread_free_fct * ptff ;
MEDIUM: init/threads: don't use spinlocks during the init phase
PiBa-NL found some pathological cases where starting threads can hinder
each other and cause a measurable slow down. This problem is reproducible
with the following config (haproxy must be built with -DDEBUG_DEV) :
global
stats socket /tmp/sock1 mode 666 level admin
nbthread 64
backend stopme
timeout server 1s
option tcp-check
tcp-check send "debug dev exit\n"
server cli unix@/tmp/sock1 check
This will cause the process to be stopped once the checks are ready to
start. Binding all these to just a few cores magnifies the problem.
Starting them in loops shows a significant time difference among the
commits :
# before startup serialization
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e186161 -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m1.581s
user 0m0.621s
sys 0m5.339s
# after startup serialization
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e4d7c9dd -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m2.366s
user 0m0.894s
sys 0m8.238s
In order to address this, let's use plain mutexes and cond_wait during
the init phase. With this done, waiting threads now sleep and the problem
completely disappeared :
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m0.161s
user 0m0.079s
sys 0m0.149s
2019-06-11 03:16:41 -04:00
static int init_left = 0 ;
2020-06-05 02:40:51 -04:00
__decl_thread ( static pthread_mutex_t init_mutex = PTHREAD_MUTEX_INITIALIZER ) ;
__decl_thread ( static pthread_cond_t init_cond = PTHREAD_COND_INITIALIZER ) ;
MAJOR: threads: Start threads to experiment multithreading
[WARNING] For now, HAProxy is not thread-safe, so from this commit, it will be
broken for a while, when compiled with threads.
When nbthread parameter is greater than 1, HAProxy will create the corresponding
number of threads. If nbthread is set to 1, nothing should be done. So if there
are concurrency issues (and be sure there will be, unfortunatly), an obvious
workaround is to disable the multithreading...
Each created threads will run a polling loop. So, in a certain way, it is pretty
similar to the nbproc mode ("outside" the bugs and the lock
contention). Nevertheless, there are an init and a deinit steps for each thread
to deal with per-thread allocation.
Each thread has a tid (thread-id), numbered from 0 to (nbtread-1). It is used in
many place to do bitwise operations or to improve debugging information.
2017-08-29 09:38:48 -04:00
2021-09-28 03:43:11 -04:00
ha_set_thread ( data ) ;
2021-09-28 04:15:47 -04:00
set_thread_cpu_affinity ( ) ;
2021-10-08 06:27:54 -04:00
clock_set_local_source ( ) ;
2023-11-22 12:01:25 -05:00
# ifdef USE_THREAD
ha_thread_info [ tid ] . pth_id = ha_get_pthread_id ( tid ) ;
# endif
ha_thread_info [ tid ] . stack_top = __builtin_frame_address ( 0 ) ;
2024-03-14 03:57:02 -04:00
/* Assign the ring queue. Contrary to an intuitive thought, this does
* not benefit from locality and it ' s counter - productive to group
* threads from a same group or range number in the same queue . In some
* sense it arranges us because it means we can use a modulo and ensure
* that even small numbers of threads are well spread .
*/
ha_thread_info [ tid ] . ring_queue =
( tid % MIN ( global . nbthread ,
( global . tune . ring_queues ?
global . tune . ring_queues :
RING_DFLT_QUEUES ) ) ) % RING_WAIT_QUEUES ;
BUG/MEDIUM: thread: consider secondary threads as idle+harmless during boot
idle and harmless bits in the tgroup_ctx structure were not explicitly
set during boot.
| struct tgroup_ctx ha_tgroup_ctx[MAX_TGROUPS] = { };
As the structure is first statically initialized,
.threads_harmless and .threads_idle are automatically zero-
initialized by the compiler.
Unfortulately, this means that such threads are not considered idle
nor harmless by thread_isolate(_full)() functions until they enter
the polling loop (thread_harmless_now() and thread_idle_now() are
respectively called before entering the polling loop)
Because of this, any attempt to call thread_isolate() or thread_isolate_full()
during a startup phase with nbthreads >= 2 will cause thread_isolate to
loop until every secondary threads make it through their first polling loop.
If the startup phase is aborted during boot (ie: "-c" option to check the
configuration), secondary threads may be initialized but will never be started
(ie: they won't enter the polling loop), thus thread_isolate()
could would loop forever in such cases.
We can easily reveal the bug with this patch reproducer:
| diff --git a/src/haproxy.c b/src/haproxy.c
| index e91691658..0b733f6ee 100644
| --- a/src/haproxy.c
| +++ b/src/haproxy.c
| @@ -2317,6 +2317,10 @@ static void init(int argc, char **argv)
| if (pr || px) {
| /* At least one peer or one listener has been found */
| qfprintf(stdout, "Configuration file is valid\n");
| + printf("haproxy will loop...\n");
| + thread_isolate();
| + printf("we will never reach this\n");
| + thread_release();
| deinit_and_exit(0);
| }
| qfprintf(stdout, "Configuration file has no error but will not start (no listener) => exit(2).\n");
Now we start haproxy with a valid config:
$> haproxy -c -f valid.conf
Configuration file is valid
haproxy will loop...
^C
------------------------------------------------------------------------------
This did not cause any issue so far because no early deinit paths require
full thread isolation. But this may change when new features or requirements
are introduced, so we should fix this before it becomes a real issue.
To fix this, we explicitly assign .threads_harmless and .threads_idle
to .threads_enabled value in thread_map_to_groups() function during boot.
This is the proper place to do this since as long as .threads_enabled is not
explicitly set, its default value is also 0 (zero-initialized by the compiler)
code snippet from thread_isolate() function:
ulong te = _HA_ATOMIC_LOAD(&ha_tgroup_info[tgrp].threads_enabled);
ulong th = _HA_ATOMIC_LOAD(&ha_tgroup_ctx[tgrp].threads_harmless);
if ((th & te) == te)
break;
Thus thread_isolate(_full()) won't be looping forever in thread_isolate()
even if it were to be used before thread_map_to_groups() is executed.
No backport needed unless this is a requirement.
2023-01-27 09:13:28 -05:00
/* thread is started, from now on it is not idle nor harmless */
thread_harmless_end ( ) ;
thread_idle_end ( ) ;
2023-02-17 02:36:42 -05:00
_HA_ATOMIC_OR ( & th_ctx - > flags , TH_FL_STARTED ) ;
2019-05-03 11:21:18 -04:00
2019-06-07 08:41:11 -04:00
/* Now, initialize one thread init at a time. This is better since
* some init code is a bit tricky and may release global resources
* after reallocating them locally . This will also ensure there is
* no race on file descriptors allocation .
*/
MEDIUM: init/threads: don't use spinlocks during the init phase
PiBa-NL found some pathological cases where starting threads can hinder
each other and cause a measurable slow down. This problem is reproducible
with the following config (haproxy must be built with -DDEBUG_DEV) :
global
stats socket /tmp/sock1 mode 666 level admin
nbthread 64
backend stopme
timeout server 1s
option tcp-check
tcp-check send "debug dev exit\n"
server cli unix@/tmp/sock1 check
This will cause the process to be stopped once the checks are ready to
start. Binding all these to just a few cores magnifies the problem.
Starting them in loops shows a significant time difference among the
commits :
# before startup serialization
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e186161 -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m1.581s
user 0m0.621s
sys 0m5.339s
# after startup serialization
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e4d7c9dd -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m2.366s
user 0m0.894s
sys 0m8.238s
In order to address this, let's use plain mutexes and cond_wait during
the init phase. With this done, waiting threads now sleep and the problem
completely disappeared :
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m0.161s
user 0m0.079s
sys 0m0.149s
2019-06-11 03:16:41 -04:00
# ifdef USE_THREAD
pthread_mutex_lock ( & init_mutex ) ;
# endif
/* The first thread must set the number of threads left */
if ( ! init_left )
init_left = global . nbthread ;
init_left - - ;
2019-05-03 11:21:18 -04:00
2021-10-08 03:33:24 -04:00
clock_init_thread_date ( ) ;
MAJOR: threads: Start threads to experiment multithreading
[WARNING] For now, HAProxy is not thread-safe, so from this commit, it will be
broken for a while, when compiled with threads.
When nbthread parameter is greater than 1, HAProxy will create the corresponding
number of threads. If nbthread is set to 1, nothing should be done. So if there
are concurrency issues (and be sure there will be, unfortunatly), an obvious
workaround is to disable the multithreading...
Each created threads will run a polling loop. So, in a certain way, it is pretty
similar to the nbproc mode ("outside" the bugs and the lock
contention). Nevertheless, there are an init and a deinit steps for each thread
to deal with per-thread allocation.
Each thread has a tid (thread-id), numbered from 0 to (nbtread-1). It is used in
many place to do bitwise operations or to improve debugging information.
2017-08-29 09:38:48 -04:00
2019-05-22 08:42:12 -04:00
/* per-thread alloc calls performed here are not allowed to snoop on
* other threads , so they are free to initialize at their own rhythm
* as long as they act as if they were alone . None of them may rely
* on resources initialized by the other ones .
*/
list_for_each_entry ( ptaf , & per_thread_alloc_list , list ) {
if ( ! ptaf - > fct ( ) ) {
ha_alert ( " failed to allocate resources for thread %u. \n " , tid ) ;
2021-07-22 08:42:32 -04:00
# ifdef USE_THREAD
2021-07-18 04:40:57 -04:00
pthread_mutex_unlock ( & init_mutex ) ;
2021-07-22 08:42:32 -04:00
# endif
2019-05-22 08:42:12 -04:00
exit ( 1 ) ;
}
}
2019-05-20 04:50:43 -04:00
/* per-thread init calls performed here are not allowed to snoop on
* other threads , so they are free to initialize at their own rhythm
* as long as they act as if they were alone .
*/
MAJOR: threads: Start threads to experiment multithreading
[WARNING] For now, HAProxy is not thread-safe, so from this commit, it will be
broken for a while, when compiled with threads.
When nbthread parameter is greater than 1, HAProxy will create the corresponding
number of threads. If nbthread is set to 1, nothing should be done. So if there
are concurrency issues (and be sure there will be, unfortunatly), an obvious
workaround is to disable the multithreading...
Each created threads will run a polling loop. So, in a certain way, it is pretty
similar to the nbproc mode ("outside" the bugs and the lock
contention). Nevertheless, there are an init and a deinit steps for each thread
to deal with per-thread allocation.
Each thread has a tid (thread-id), numbered from 0 to (nbtread-1). It is used in
many place to do bitwise operations or to improve debugging information.
2017-08-29 09:38:48 -04:00
list_for_each_entry ( ptif , & per_thread_init_list , list ) {
if ( ! ptif - > fct ( ) ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " failed to initialize thread %u. \n " , tid ) ;
2021-07-22 08:42:32 -04:00
# ifdef USE_THREAD
2021-07-18 04:40:57 -04:00
pthread_mutex_unlock ( & init_mutex ) ;
2021-07-22 08:42:32 -04:00
# endif
MAJOR: threads: Start threads to experiment multithreading
[WARNING] For now, HAProxy is not thread-safe, so from this commit, it will be
broken for a while, when compiled with threads.
When nbthread parameter is greater than 1, HAProxy will create the corresponding
number of threads. If nbthread is set to 1, nothing should be done. So if there
are concurrency issues (and be sure there will be, unfortunatly), an obvious
workaround is to disable the multithreading...
Each created threads will run a polling loop. So, in a certain way, it is pretty
similar to the nbproc mode ("outside" the bugs and the lock
contention). Nevertheless, there are an init and a deinit steps for each thread
to deal with per-thread allocation.
Each thread has a tid (thread-id), numbered from 0 to (nbtread-1). It is used in
many place to do bitwise operations or to improve debugging information.
2017-08-29 09:38:48 -04:00
exit ( 1 ) ;
}
}
BUG/MEDIUM: init/threads: prevent initialized threads from starting before others
Since commit 6ec902a ("MINOR: threads: serialize threads initialization")
we now serialize threads initialization. But doing so has emphasized another
race which is that some threads may actually start the loop before others
are done initializing.
As soon as all threads enter the first thread_release() call, their rdv
bit is cleared and they're all waiting for all others' rdv to be cleared
as well, with their harmless bit set. The first one to notice the cleared
mask will progress through thread_isolate(), take rdv again preventing
most others from noticing its short pass to zero, and this first one will
be able to run all the way through the initialization till the last call
to thread_release() which it happily crosses, being the only one with the
rdv bit, leaving the room for one or a few others to do the same. This
results in some threads entering the loop before others are done with
their initialization, which is particularly bad. PiBa-NL reported that
some regtests fail for him due to this (which was impossible to reproduce
here, but races are racy by definition). However placing some printf()
in the initialization code definitely shows this unsychronized startup.
This patch takes a different approach in three steps :
- first, we don't start with thread_release() anymore and we don't
set the rdv mask anymore in the main call. This was initially done
to let all threads start toghether, which we don't want. Instead
we just start with thread_isolate(). Since all threads are harmful
by default, they all wait for each other's readiness before starting.
- second, we don't release with thread_release() but with
thread_sync_release(), meaning that we don't leave the function until
other ones have reached the point in the function where they decide
to leave it as well.
- third, it makes sure we don't start the listeners using
protocol_enable_all() before all threads have allocated their local
FD tables or have initialized their pollers, otherwise startup could
be racy as well. It's worth noting that it is even possible to limit
this call to thread #0 as it only needs to be performed once.
This now guarantees that all thread init calls start only after all threads
are ready, and that no thread enters the polling loop before all others have
completed their initialization.
Please check GH issues #111 and #117 for more context.
No backport is needed, though if some new init races are reported in
1.9 (or even 1.8) which do not affect 2.0, then it may make sense to
carefully backport this small series.
2019-06-10 03:51:04 -04:00
/* enabling protocols will result in fd_insert() calls to be performed,
* we want all threads to have already allocated their local fd tables
MEDIUM: init/threads: don't use spinlocks during the init phase
PiBa-NL found some pathological cases where starting threads can hinder
each other and cause a measurable slow down. This problem is reproducible
with the following config (haproxy must be built with -DDEBUG_DEV) :
global
stats socket /tmp/sock1 mode 666 level admin
nbthread 64
backend stopme
timeout server 1s
option tcp-check
tcp-check send "debug dev exit\n"
server cli unix@/tmp/sock1 check
This will cause the process to be stopped once the checks are ready to
start. Binding all these to just a few cores magnifies the problem.
Starting them in loops shows a significant time difference among the
commits :
# before startup serialization
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e186161 -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m1.581s
user 0m0.621s
sys 0m5.339s
# after startup serialization
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e4d7c9dd -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m2.366s
user 0m0.894s
sys 0m8.238s
In order to address this, let's use plain mutexes and cond_wait during
the init phase. With this done, waiting threads now sleep and the problem
completely disappeared :
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m0.161s
user 0m0.079s
sys 0m0.149s
2019-06-11 03:16:41 -04:00
* before doing so , thus only the last thread does it .
BUG/MEDIUM: init/threads: prevent initialized threads from starting before others
Since commit 6ec902a ("MINOR: threads: serialize threads initialization")
we now serialize threads initialization. But doing so has emphasized another
race which is that some threads may actually start the loop before others
are done initializing.
As soon as all threads enter the first thread_release() call, their rdv
bit is cleared and they're all waiting for all others' rdv to be cleared
as well, with their harmless bit set. The first one to notice the cleared
mask will progress through thread_isolate(), take rdv again preventing
most others from noticing its short pass to zero, and this first one will
be able to run all the way through the initialization till the last call
to thread_release() which it happily crosses, being the only one with the
rdv bit, leaving the room for one or a few others to do the same. This
results in some threads entering the loop before others are done with
their initialization, which is particularly bad. PiBa-NL reported that
some regtests fail for him due to this (which was impossible to reproduce
here, but races are racy by definition). However placing some printf()
in the initialization code definitely shows this unsychronized startup.
This patch takes a different approach in three steps :
- first, we don't start with thread_release() anymore and we don't
set the rdv mask anymore in the main call. This was initially done
to let all threads start toghether, which we don't want. Instead
we just start with thread_isolate(). Since all threads are harmful
by default, they all wait for each other's readiness before starting.
- second, we don't release with thread_release() but with
thread_sync_release(), meaning that we don't leave the function until
other ones have reached the point in the function where they decide
to leave it as well.
- third, it makes sure we don't start the listeners using
protocol_enable_all() before all threads have allocated their local
FD tables or have initialized their pollers, otherwise startup could
be racy as well. It's worth noting that it is even possible to limit
this call to thread #0 as it only needs to be performed once.
This now guarantees that all thread init calls start only after all threads
are ready, and that no thread enters the polling loop before all others have
completed their initialization.
Please check GH issues #111 and #117 for more context.
No backport is needed, though if some new init races are reported in
1.9 (or even 1.8) which do not affect 2.0, then it may make sense to
carefully backport this small series.
2019-06-10 03:51:04 -04:00
*/
MEDIUM: init/threads: don't use spinlocks during the init phase
PiBa-NL found some pathological cases where starting threads can hinder
each other and cause a measurable slow down. This problem is reproducible
with the following config (haproxy must be built with -DDEBUG_DEV) :
global
stats socket /tmp/sock1 mode 666 level admin
nbthread 64
backend stopme
timeout server 1s
option tcp-check
tcp-check send "debug dev exit\n"
server cli unix@/tmp/sock1 check
This will cause the process to be stopped once the checks are ready to
start. Binding all these to just a few cores magnifies the problem.
Starting them in loops shows a significant time difference among the
commits :
# before startup serialization
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e186161 -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m1.581s
user 0m0.621s
sys 0m5.339s
# after startup serialization
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e4d7c9dd -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m2.366s
user 0m0.894s
sys 0m8.238s
In order to address this, let's use plain mutexes and cond_wait during
the init phase. With this done, waiting threads now sleep and the problem
completely disappeared :
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m0.161s
user 0m0.079s
sys 0m0.149s
2019-06-11 03:16:41 -04:00
if ( init_left = = 0 )
2019-06-10 04:14:52 -04:00
protocol_enable_all ( ) ;
2019-06-07 08:41:11 -04:00
MEDIUM: init/threads: don't use spinlocks during the init phase
PiBa-NL found some pathological cases where starting threads can hinder
each other and cause a measurable slow down. This problem is reproducible
with the following config (haproxy must be built with -DDEBUG_DEV) :
global
stats socket /tmp/sock1 mode 666 level admin
nbthread 64
backend stopme
timeout server 1s
option tcp-check
tcp-check send "debug dev exit\n"
server cli unix@/tmp/sock1 check
This will cause the process to be stopped once the checks are ready to
start. Binding all these to just a few cores magnifies the problem.
Starting them in loops shows a significant time difference among the
commits :
# before startup serialization
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e186161 -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m1.581s
user 0m0.621s
sys 0m5.339s
# after startup serialization
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e4d7c9dd -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m2.366s
user 0m0.894s
sys 0m8.238s
In order to address this, let's use plain mutexes and cond_wait during
the init phase. With this done, waiting threads now sleep and the problem
completely disappeared :
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m0.161s
user 0m0.079s
sys 0m0.149s
2019-06-11 03:16:41 -04:00
# ifdef USE_THREAD
pthread_cond_broadcast ( & init_cond ) ;
pthread_mutex_unlock ( & init_mutex ) ;
/* now wait for other threads to finish starting */
pthread_mutex_lock ( & init_mutex ) ;
while ( init_left )
pthread_cond_wait ( & init_cond , & init_mutex ) ;
pthread_mutex_unlock ( & init_mutex ) ;
# endif
2019-05-20 04:50:43 -04:00
2019-12-06 10:31:45 -05:00
# if defined(PR_SET_NO_NEW_PRIVS) && defined(USE_PRCTL)
/* Let's refrain from using setuid executables. This way the impact of
* an eventual vulnerability in a library remains limited . It may
* impact external checks but who cares about them anyway ? In the
* worst case it ' s possible to disable the option . Obviously we do this
* in workers only . We can ' t hard - fail on this one as it really is
* implementation dependent though we ' re interested in feedback , hence
* the warning .
*/
if ( ! ( global . tune . options & GTUNE_INSECURE_SETUID ) & & ! master ) {
static int warn_fail ;
2021-04-06 05:57:41 -04:00
if ( prctl ( PR_SET_NO_NEW_PRIVS , 1 , 0 , 0 , 0 ) = = - 1 & & ! _HA_ATOMIC_FETCH_ADD ( & warn_fail , 1 ) ) {
2019-12-06 10:31:45 -05:00
ha_warning ( " Failed to disable setuid, please report to developers with detailed "
" information about your operating system. You can silence this warning "
" by adding 'insecure-setuid-wanted' in the 'global' section. \n " ) ;
}
}
# endif
MEDIUM: init: prevent process and thread creation at runtime
Some concerns are regularly raised about the risk to inherit some Lua
files which make use of a fork (e.g. via os.execute()) as well as
whether or not some of bugs we fix might or not be exploitable to run
some code. Given that haproxy is event-driven, any foreground activity
completely stops processing and is easy to detect, but background
activity is a different story. A Lua script could very well discretely
fork a sub-process connecting to a remote location and taking commands,
and some injected code could also try to hide its activity by creating
a process or a thread without blocking the rest of the processing. While
such activities should be extremely limited when run in an empty chroot
without any permission, it would be better to get a higher assurance
they cannot happen.
This patch introduces something very simple: it limits the number of
processes and threads to zero in the workers after the last thread was
created. By doing so, it effectively instructs the system to fail on
any fork() or clone() syscall. Thus any undesired activity has to happen
in the foreground and is way easier to detect.
This will obviously break external checks (whose concept is already
totally insecure), and for this reason a new option
"insecure-fork-wanted" was added to disable this protection, and it
is suggested in the fork() error report from the checks. It is
obviously recommended not to use it and to reconsider the reasons
leading to it being enabled in the first place.
If for any reason we fail to disable forks, we still start because it
could be imaginable that some operating systems refuse to set this
limit to zero, but in this case we emit a warning, that may or may not
be reported since we're after the fork point. Ideally over the long
term it should be conditionned by strict-limits and cause a hard fail.
2019-12-03 01:07:36 -05:00
# if defined(RLIMIT_NPROC)
/* all threads have started, it's now time to prevent any new thread
* or process from starting . Obviously we do this in workers only . We
* can ' t hard - fail on this one as it really is implementation dependent
* though we ' re interested in feedback , hence the warning .
*/
if ( ! ( global . tune . options & GTUNE_INSECURE_FORK ) & & ! master ) {
struct rlimit limit = { . rlim_cur = 0 , . rlim_max = 0 } ;
static int warn_fail ;
2021-04-06 05:57:41 -04:00
if ( setrlimit ( RLIMIT_NPROC , & limit ) = = - 1 & & ! _HA_ATOMIC_FETCH_ADD ( & warn_fail , 1 ) ) {
MEDIUM: init: prevent process and thread creation at runtime
Some concerns are regularly raised about the risk to inherit some Lua
files which make use of a fork (e.g. via os.execute()) as well as
whether or not some of bugs we fix might or not be exploitable to run
some code. Given that haproxy is event-driven, any foreground activity
completely stops processing and is easy to detect, but background
activity is a different story. A Lua script could very well discretely
fork a sub-process connecting to a remote location and taking commands,
and some injected code could also try to hide its activity by creating
a process or a thread without blocking the rest of the processing. While
such activities should be extremely limited when run in an empty chroot
without any permission, it would be better to get a higher assurance
they cannot happen.
This patch introduces something very simple: it limits the number of
processes and threads to zero in the workers after the last thread was
created. By doing so, it effectively instructs the system to fail on
any fork() or clone() syscall. Thus any undesired activity has to happen
in the foreground and is way easier to detect.
This will obviously break external checks (whose concept is already
totally insecure), and for this reason a new option
"insecure-fork-wanted" was added to disable this protection, and it
is suggested in the fork() error report from the checks. It is
obviously recommended not to use it and to reconsider the reasons
leading to it being enabled in the first place.
If for any reason we fail to disable forks, we still start because it
could be imaginable that some operating systems refuse to set this
limit to zero, but in this case we emit a warning, that may or may not
be reported since we're after the fork point. Ideally over the long
term it should be conditionned by strict-limits and cause a hard fail.
2019-12-03 01:07:36 -05:00
ha_warning ( " Failed to disable forks, please report to developers with detailed "
" information about your operating system. You can silence this warning "
" by adding 'insecure-fork-wanted' in the 'global' section. \n " ) ;
}
}
# endif
MAJOR: threads: Start threads to experiment multithreading
[WARNING] For now, HAProxy is not thread-safe, so from this commit, it will be
broken for a while, when compiled with threads.
When nbthread parameter is greater than 1, HAProxy will create the corresponding
number of threads. If nbthread is set to 1, nothing should be done. So if there
are concurrency issues (and be sure there will be, unfortunatly), an obvious
workaround is to disable the multithreading...
Each created threads will run a polling loop. So, in a certain way, it is pretty
similar to the nbproc mode ("outside" the bugs and the lock
contention). Nevertheless, there are an init and a deinit steps for each thread
to deal with per-thread allocation.
Each thread has a tid (thread-id), numbered from 0 to (nbtread-1). It is used in
many place to do bitwise operations or to improve debugging information.
2017-08-29 09:38:48 -04:00
run_poll_loop ( ) ;
list_for_each_entry ( ptdf , & per_thread_deinit_list , list )
ptdf - > fct ( ) ;
2019-05-22 08:42:12 -04:00
list_for_each_entry ( ptff , & per_thread_free_list , list )
ptff - > fct ( ) ;
2017-10-27 07:53:47 -04:00
# ifdef USE_THREAD
2022-07-04 07:36:16 -04:00
if ( ! _HA_ATOMIC_AND_FETCH ( & ha_tgroup_info [ ti - > tgid - 1 ] . threads_enabled , ~ ti - > ltid_bit ) )
2022-06-24 09:55:11 -04:00
_HA_ATOMIC_AND ( & all_tgroups_mask , ~ tg - > tgid_bit ) ;
2022-07-06 04:17:21 -04:00
if ( ! _HA_ATOMIC_AND_FETCH ( & tg_ctx - > stopping_threads , ~ ti - > ltid_bit ) )
_HA_ATOMIC_AND ( & stopping_tgroup_mask , ~ tg - > tgid_bit ) ;
2017-10-27 07:53:47 -04:00
if ( tid > 0 )
pthread_exit ( NULL ) ;
MAJOR: threads: Start threads to experiment multithreading
[WARNING] For now, HAProxy is not thread-safe, so from this commit, it will be
broken for a while, when compiled with threads.
When nbthread parameter is greater than 1, HAProxy will create the corresponding
number of threads. If nbthread is set to 1, nothing should be done. So if there
are concurrency issues (and be sure there will be, unfortunatly), an obvious
workaround is to disable the multithreading...
Each created threads will run a polling loop. So, in a certain way, it is pretty
similar to the nbproc mode ("outside" the bugs and the lock
contention). Nevertheless, there are an init and a deinit steps for each thread
to deal with per-thread allocation.
Each thread has a tid (thread-id), numbered from 0 to (nbtread-1). It is used in
many place to do bitwise operations or to improve debugging information.
2017-08-29 09:38:48 -04:00
# endif
2017-10-27 07:53:47 -04:00
return NULL ;
}
MAJOR: threads: Start threads to experiment multithreading
[WARNING] For now, HAProxy is not thread-safe, so from this commit, it will be
broken for a while, when compiled with threads.
When nbthread parameter is greater than 1, HAProxy will create the corresponding
number of threads. If nbthread is set to 1, nothing should be done. So if there
are concurrency issues (and be sure there will be, unfortunatly), an obvious
workaround is to disable the multithreading...
Each created threads will run a polling loop. So, in a certain way, it is pretty
similar to the nbproc mode ("outside" the bugs and the lock
contention). Nevertheless, there are an init and a deinit steps for each thread
to deal with per-thread allocation.
Each thread has a tid (thread-id), numbered from 0 to (nbtread-1). It is used in
many place to do bitwise operations or to improve debugging information.
2017-08-29 09:38:48 -04:00
2019-11-17 09:47:16 -05:00
/* set uid/gid depending on global settings */
static void set_identity ( const char * program_name )
{
2023-08-29 04:24:26 -04:00
int from_uid __maybe_unused = geteuid ( ) ;
2019-11-17 09:47:16 -05:00
if ( global . gid ) {
if ( getgroups ( 0 , NULL ) > 0 & & setgroups ( 0 , NULL ) = = - 1 )
ha_warning ( " [%s.main()] Failed to drop supplementary groups. Using 'gid'/'group' "
" without 'uid'/'user' is generally useless. \n " , program_name ) ;
if ( setgid ( global . gid ) = = - 1 ) {
ha_alert ( " [%s.main()] Cannot set gid %d. \n " , program_name , global . gid ) ;
protocol_unbind_all ( ) ;
exit ( 1 ) ;
}
}
2023-08-29 04:24:26 -04:00
# if defined(USE_LINUX_CAP)
if ( prepare_caps_for_setuid ( from_uid , global . uid ) < 0 ) {
ha_alert ( " [%s.main()] Cannot switch uid to %d. \n " , program_name , global . uid ) ;
protocol_unbind_all ( ) ;
exit ( 1 ) ;
}
# endif
2019-11-17 09:47:16 -05:00
if ( global . uid & & setuid ( global . uid ) = = - 1 ) {
ha_alert ( " [%s.main()] Cannot set uid %d. \n " , program_name , global . uid ) ;
protocol_unbind_all ( ) ;
exit ( 1 ) ;
}
2023-08-29 04:24:26 -04:00
# if defined(USE_LINUX_CAP)
if ( finalize_caps_after_setuid ( from_uid , global . uid ) < 0 ) {
ha_alert ( " [%s.main()] Cannot switch uid to %d. \n " , program_name , global . uid ) ;
protocol_unbind_all ( ) ;
exit ( 1 ) ;
}
# endif
2019-11-17 09:47:16 -05:00
}
2006-06-25 20:48:02 -04:00
int main ( int argc , char * * argv )
{
int err , retry ;
struct rlimit limit ;
2012-09-05 02:02:48 -04:00
int pidfd = - 1 ;
2021-07-14 11:54:01 -04:00
int intovf = ( unsigned char ) argc + 1 ; /* let the compiler know it's strictly positive */
2022-07-21 03:55:22 -04:00
/* Catch broken toolchains */
if ( sizeof ( long ) ! = sizeof ( void * ) | | ( intovf + 0x7FFFFFFF > = intovf ) ) {
const char * msg ;
if ( sizeof ( long ) ! = sizeof ( void * ) )
/* Apparently MingW64 was not made for us and can also break openssl */
msg = " The compiler this program was built with uses unsupported integral type sizes. \n "
" Most likely it follows the unsupported LLP64 model. Never try to link HAProxy \n "
" against libraries built with that compiler either! Please only use a compiler \n "
" producing ILP32 or LP64 programs for both programs and libraries. \n " ;
else if ( intovf + 0x7FFFFFFF > = intovf )
/* Catch forced CFLAGS that miss 2-complement integer overflow */
msg = " The source code was miscompiled by the compiler, which usually indicates that \n "
" some of the CFLAGS needed to work around overzealous compiler optimizations \n "
" were overwritten at build time. Please do not force CFLAGS, and read Makefile \n "
" and INSTALL files to decide on the best way to pass your local build options. \n " ;
else
msg = " Bug in the compiler bug detection code, please report it to developers! \n " ;
2021-07-14 11:54:01 -04:00
fprintf ( stderr ,
" FATAL ERROR: invalid code detected -- cannot go further, please recompile! \n "
2022-07-21 03:55:22 -04:00
" %s "
" \n Build options : "
2021-07-14 11:54:01 -04:00
# ifdef BUILD_TARGET
2022-07-21 03:55:22 -04:00
" \n TARGET = " BUILD_TARGET
2021-07-14 11:54:01 -04:00
# endif
# ifdef BUILD_CC
2022-07-21 03:55:22 -04:00
" \n CC = " BUILD_CC
2021-07-14 11:54:01 -04:00
# endif
# ifdef BUILD_CFLAGS
2022-07-21 03:55:22 -04:00
" \n CFLAGS = " BUILD_CFLAGS
2021-07-14 11:54:01 -04:00
# endif
# ifdef BUILD_OPTIONS
2022-07-21 03:55:22 -04:00
" \n OPTIONS = " BUILD_OPTIONS
2021-07-14 11:54:01 -04:00
# endif
# ifdef BUILD_DEBUG
2022-07-21 03:55:22 -04:00
" \n DEBUG = " BUILD_DEBUG
2021-07-14 11:54:01 -04:00
# endif
2022-07-21 03:55:22 -04:00
" \n \n " , msg ) ;
2021-07-14 11:54:01 -04:00
return 1 ;
}
2006-06-25 20:48:02 -04:00
2018-02-03 09:15:21 -05:00
setvbuf ( stdout , NULL , _IONBF , 0 ) ;
2018-11-25 12:43:29 -05:00
2019-03-01 04:09:28 -05:00
/* take a copy of initial limits before we possibly change them */
getrlimit ( RLIMIT_NOFILE , & limit ) ;
2020-10-13 09:36:08 -04:00
if ( limit . rlim_max = = RLIM_INFINITY )
limit . rlim_max = limit . rlim_cur ;
2019-03-01 04:09:28 -05:00
rlim_fd_cur_at_boot = limit . rlim_cur ;
rlim_fd_max_at_boot = limit . rlim_max ;
2018-11-25 12:43:29 -05:00
/* process all initcalls in order of potential dependency */
RUN_INITCALLS ( STG_PREPARE ) ;
RUN_INITCALLS ( STG_LOCK ) ;
2022-02-18 08:51:49 -05:00
RUN_INITCALLS ( STG_REGISTER ) ;
2022-02-17 11:45:58 -05:00
/* now's time to initialize early boot variables */
init_early ( argc , argv ) ;
2022-02-23 11:25:00 -05:00
/* handles argument parsing */
init_args ( argc , argv ) ;
2018-11-25 12:43:29 -05:00
RUN_INITCALLS ( STG_ALLOC ) ;
RUN_INITCALLS ( STG_POOL ) ;
2023-07-11 12:42:53 -04:00
/* some code really needs to have the trash properly allocated */
if ( ! trash . area ) {
ha_alert ( " failed to initialize trash buffers. \n " ) ;
exit ( 1 ) ;
}
2018-11-25 12:43:29 -05:00
RUN_INITCALLS ( STG_INIT ) ;
2022-02-17 11:45:58 -05:00
/* this is the late init where the config is parsed */
2010-10-22 10:06:11 -04:00
init ( argc , argv ) ;
2022-02-17 11:45:58 -05:00
2010-08-27 11:56:48 -04:00
signal_register_fct ( SIGQUIT , dump , SIGQUIT ) ;
signal_register_fct ( SIGUSR1 , sig_soft_stop , SIGUSR1 ) ;
signal_register_fct ( SIGHUP , sig_dump_state , SIGHUP ) ;
2017-06-01 11:38:51 -04:00
signal_register_fct ( SIGUSR2 , NULL , 0 ) ;
2006-06-25 20:48:02 -04:00
2010-03-17 13:02:46 -04:00
/* Always catch SIGPIPE even on platforms which define MSG_NOSIGNAL.
* Some recent FreeBSD setups report broken pipes , and MSG_NOSIGNAL
* was defined there , so let ' s stay on the safe side .
2006-06-25 20:48:02 -04:00
*/
2010-08-27 11:56:48 -04:00
signal_register_fct ( SIGPIPE , NULL , 0 ) ;
2006-06-25 20:48:02 -04:00
2011-02-16 05:10:36 -05:00
/* ulimits */
if ( ! global . rlimit_nofile )
global . rlimit_nofile = global . maxsock ;
if ( global . rlimit_nofile ) {
2019-03-01 04:32:05 -05:00
limit . rlim_cur = global . rlimit_nofile ;
limit . rlim_max = MAX ( rlim_fd_max_at_boot , limit . rlim_cur ) ;
2022-04-25 12:02:03 -04:00
if ( ( global . fd_hard_limit & & limit . rlim_cur > global . fd_hard_limit ) | |
2022-09-22 10:12:08 -04:00
raise_rlim_nofile ( NULL , & limit ) ! = 0 ) {
2016-06-21 05:48:18 -04:00
getrlimit ( RLIMIT_NOFILE , & limit ) ;
2022-04-25 12:02:03 -04:00
if ( global . fd_hard_limit & & limit . rlim_cur > global . fd_hard_limit )
limit . rlim_cur = global . fd_hard_limit ;
2019-10-27 15:08:11 -04:00
if ( global . tune . options & GTUNE_STRICT_LIMITS ) {
ha_alert ( " [%s.main()] Cannot raise FD limit to %d, limit is %d. \n " ,
argv [ 0 ] , global . rlimit_nofile , ( int ) limit . rlim_cur ) ;
2021-01-12 14:19:38 -05:00
exit ( 1 ) ;
2019-10-27 15:08:11 -04:00
}
else {
/* try to set it to the max possible at least */
limit . rlim_cur = limit . rlim_max ;
2022-04-25 12:02:03 -04:00
if ( global . fd_hard_limit & & limit . rlim_cur > global . fd_hard_limit )
limit . rlim_cur = global . fd_hard_limit ;
2022-09-22 10:12:08 -04:00
if ( raise_rlim_nofile ( & limit , & limit ) = = 0 )
2019-10-27 15:08:11 -04:00
getrlimit ( RLIMIT_NOFILE , & limit ) ;
2020-03-28 14:29:58 -04:00
ha_warning ( " [%s.main()] Cannot raise FD limit to %d, limit is %d. \n " ,
2019-10-27 15:08:11 -04:00
argv [ 0 ] , global . rlimit_nofile , ( int ) limit . rlim_cur ) ;
global . rlimit_nofile = limit . rlim_cur ;
}
2011-02-16 05:10:36 -05:00
}
}
if ( global . rlimit_memmax ) {
limit . rlim_cur = limit . rlim_max =
2015-12-14 06:46:07 -05:00
global . rlimit_memmax * 1048576ULL ;
2011-02-16 05:10:36 -05:00
if ( setrlimit ( RLIMIT_DATA , & limit ) = = - 1 ) {
2019-10-27 15:08:11 -04:00
if ( global . tune . options & GTUNE_STRICT_LIMITS ) {
ha_alert ( " [%s.main()] Cannot fix MEM limit to %d megs. \n " ,
argv [ 0 ] , global . rlimit_memmax ) ;
2021-01-12 14:19:38 -05:00
exit ( 1 ) ;
2019-10-27 15:08:11 -04:00
}
else
2020-03-28 14:29:58 -04:00
ha_warning ( " [%s.main()] Cannot fix MEM limit to %d megs. \n " ,
2019-10-27 15:08:11 -04:00
argv [ 0 ] , global . rlimit_memmax ) ;
2011-02-16 05:10:36 -05:00
}
}
MEDIUM: capabilities: check process capabilities sets
Since the Linux capabilities support add-on (see the commit bd84387beb26
("MEDIUM: capabilities: enable support for Linux capabilities")), we can also
check haproxy process effective and permitted capabilities sets, when it
starts and runs as non-root.
Like this, if needed network capabilities are presented only in the process
permitted set, we can get this information with capget and put them in the
process effective set via capset. To do this properly, let's introduce
prepare_caps_from_permitted_set().
First, it checks if binary effective set has CAP_NET_ADMIN or CAP_NET_RAW. If
there is a match, LSTCHK_NETADM is removed from global.last_checks list to
avoid warning, because in the initialization sequence some last configuration
checks are based on LSTCHK_NETADM flag and haproxy process euid may stay
unpriviledged.
If there are no CAP_NET_ADMIN and CAP_NET_RAW in the effective set, permitted
set will be checked and only capabilities given in 'setcap' keyword will be
promoted in the process effective set. LSTCHK_NETADM will be also removed in
this case by the same reason. In order to be transparent, we promote from
permitted set only capabilities given by user in 'setcap' keyword. So, if
caplist doesn't include CAP_NET_ADMIN or CAP_NET_RAW, LSTCHK_NETADM would not
be unset and warning about missing priviledges will be emitted at
initialization.
Need to call it before protocol_bind_all() to allow binding to priviledged
ports under non-root and 'setcap cap_net_bind_service' must be set in the
global section in this case.
2024-03-15 13:02:05 -04:00
# if defined(USE_LINUX_CAP)
/* If CAP_NET_BIND_SERVICE is in binary file permitted set and process
* is started and run under the same non - root user , this allows
2024-04-14 03:23:52 -04:00
* binding to privileged ports .
MEDIUM: capabilities: check process capabilities sets
Since the Linux capabilities support add-on (see the commit bd84387beb26
("MEDIUM: capabilities: enable support for Linux capabilities")), we can also
check haproxy process effective and permitted capabilities sets, when it
starts and runs as non-root.
Like this, if needed network capabilities are presented only in the process
permitted set, we can get this information with capget and put them in the
process effective set via capset. To do this properly, let's introduce
prepare_caps_from_permitted_set().
First, it checks if binary effective set has CAP_NET_ADMIN or CAP_NET_RAW. If
there is a match, LSTCHK_NETADM is removed from global.last_checks list to
avoid warning, because in the initialization sequence some last configuration
checks are based on LSTCHK_NETADM flag and haproxy process euid may stay
unpriviledged.
If there are no CAP_NET_ADMIN and CAP_NET_RAW in the effective set, permitted
set will be checked and only capabilities given in 'setcap' keyword will be
promoted in the process effective set. LSTCHK_NETADM will be also removed in
this case by the same reason. In order to be transparent, we promote from
permitted set only capabilities given by user in 'setcap' keyword. So, if
caplist doesn't include CAP_NET_ADMIN or CAP_NET_RAW, LSTCHK_NETADM would not
be unset and warning about missing priviledges will be emitted at
initialization.
Need to call it before protocol_bind_all() to allow binding to priviledged
ports under non-root and 'setcap cap_net_bind_service' must be set in the
global section in this case.
2024-03-15 13:02:05 -04:00
*/
prepare_caps_from_permitted_set ( geteuid ( ) , global . uid , argv [ 0 ] ) ;
# endif
2022-01-07 12:19:42 -05:00
/* Try to get the listeners FD from the previous process using
* _getsocks on the stat socket , it must never been done in wait mode
* and check mode
*/
if ( old_unixsocket & &
! ( global . mode & ( MODE_MWORKER_WAIT | MODE_CHECK | MODE_CHECK_CONDITION ) ) ) {
2017-06-01 11:38:53 -04:00
if ( strcmp ( " /dev/null " , old_unixsocket ) ! = 0 ) {
2020-08-28 12:42:45 -04:00
if ( sock_get_old_sockets ( old_unixsocket ) ! = 0 ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " Failed to get the sockets from the old process! \n " ) ;
2017-06-01 11:38:53 -04:00
if ( ! ( global . mode & MODE_MWORKER ) )
exit ( 1 ) ;
}
2017-04-05 16:33:04 -04:00
}
}
2017-06-01 11:38:53 -04:00
2006-06-25 20:48:02 -04:00
/* We will loop at most 100 times with 10 ms delay each time.
* That ' s at most 1 second . We only send a signal to old pids
* if we cannot grab at least one port .
*/
retry = MAX_START_RETRIES ;
err = ERR_NONE ;
while ( retry > = 0 ) {
struct timeval w ;
2020-09-02 05:11:43 -04:00
err = protocol_bind_all ( retry = = 0 | | nb_oldpids = = 0 ) ;
2007-12-20 17:05:50 -05:00
/* exit the loop on no error or fatal error */
if ( ( err & ( ERR_RETRYABLE | ERR_FATAL ) ) ! = ERR_RETRYABLE )
2006-06-25 20:48:02 -04:00
break ;
2010-08-25 06:58:59 -04:00
if ( nb_oldpids = = 0 | | retry = = 0 )
2006-06-25 20:48:02 -04:00
break ;
/* FIXME-20060514: Solaris and OpenBSD do not support shutdown() on
* listening sockets . So on those platforms , it would be wiser to
* simply send SIGUSR1 , which will not be undoable .
*/
2010-08-25 06:58:59 -04:00
if ( tell_old_pids ( SIGTTOU ) = = 0 ) {
/* no need to wait if we can't contact old pids */
retry = 0 ;
continue ;
}
2006-06-25 20:48:02 -04:00
/* give some time to old processes to stop listening */
w . tv_sec = 0 ;
w . tv_usec = 10 * 1000 ;
select ( 0 , NULL , NULL , NULL , & w ) ;
retry - - ;
}
2020-09-02 05:11:43 -04:00
/* Note: protocol_bind_all() sends an alert when it fails. */
2009-02-04 11:05:23 -05:00
if ( ( err & ~ ERR_WARN ) ! = ERR_NONE ) {
2020-09-02 05:11:43 -04:00
ha_alert ( " [%s.main()] Some protocols failed to start their listeners! Exiting. \n " , argv [ 0 ] ) ;
2023-01-17 10:30:52 -05:00
if ( retry ! = MAX_START_RETRIES & & nb_oldpids )
2006-06-25 20:48:02 -04:00
tell_old_pids ( SIGTTIN ) ;
2023-01-17 10:30:52 -05:00
protocol_unbind_all ( ) ; /* cleanup everything we can */
2006-06-25 20:48:02 -04:00
exit ( 1 ) ;
}
2018-11-21 09:48:31 -05:00
if ( ! ( global . mode & MODE_MWORKER_WAIT ) & & listeners = = 0 ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " [%s.main()] No enabled listener found (check for 'bind' directives) ! Exiting. \n " , argv [ 0 ] ) ;
2006-06-25 20:48:02 -04:00
/* Note: we don't have to send anything to the old pids because we
* never stopped them . */
exit ( 1 ) ;
}
2020-09-02 05:11:43 -04:00
/* Ok, all listeners should now be bound, close any leftover sockets
2017-04-05 16:33:04 -04:00
* the previous process gave us , we don ' t need them anymore
*/
2022-01-28 12:28:18 -05:00
sock_drop_unused_old_sockets ( ) ;
2007-10-16 06:25:14 -04:00
2006-06-25 20:48:02 -04:00
/* prepare pause/play signals */
2010-08-27 11:56:48 -04:00
signal_register_fct ( SIGTTOU , sig_pause , SIGTTOU ) ;
signal_register_fct ( SIGTTIN , sig_listen , SIGTTIN ) ;
2006-06-25 20:48:02 -04:00
/* MODE_QUIET can inhibit alerts and warnings below this line */
2017-12-25 15:03:31 -05:00
if ( getenv ( " HAPROXY_MWORKER_REEXEC " ) ! = NULL ) {
/* either stdin/out/err are already closed or should stay as they are. */
if ( ( global . mode & MODE_DAEMON ) ) {
/* daemon mode re-executing, stdin/stdout/stderr are already closed so keep quiet */
global . mode & = ~ MODE_VERBOSE ;
global . mode | = MODE_QUIET ; /* ensure that we won't say anything from now */
}
} else {
if ( ( global . mode & MODE_QUIET ) & & ! ( global . mode & MODE_VERBOSE ) ) {
/* detach from the tty */
2017-12-28 10:09:36 -05:00
stdio_quiet ( - 1 ) ;
2017-12-25 15:03:31 -05:00
}
2006-06-25 20:48:02 -04:00
}
/* open log & pid files before the chroot */
2022-02-14 03:02:14 -05:00
if ( ( global . mode & MODE_DAEMON | | global . mode & MODE_MWORKER ) & &
! ( global . mode & MODE_MWORKER_WAIT ) & & global . pidfile ! = NULL ) {
2006-06-25 20:48:02 -04:00
unlink ( global . pidfile ) ;
pidfd = open ( global . pidfile , O_CREAT | O_WRONLY | O_TRUNC , 0644 ) ;
if ( pidfd < 0 ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " [%s.main()] Cannot create pidfile %s \n " , argv [ 0 ] , global . pidfile ) ;
2006-06-25 20:48:02 -04:00
if ( nb_oldpids )
tell_old_pids ( SIGTTIN ) ;
2007-10-16 06:25:14 -04:00
protocol_unbind_all ( ) ;
2006-06-25 20:48:02 -04:00
exit ( 1 ) ;
}
}
2017-06-01 11:38:50 -04:00
if ( ( global . mode & ( MODE_MWORKER | MODE_DAEMON ) ) = = 0 ) {
/* chroot if needed */
if ( global . chroot ! = NULL ) {
if ( chroot ( global . chroot ) = = - 1 | | chdir ( " / " ) = = - 1 ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " [%s.main()] Cannot chroot(%s). \n " , argv [ 0 ] , global . chroot ) ;
2017-06-01 11:38:50 -04:00
if ( nb_oldpids )
tell_old_pids ( SIGTTIN ) ;
protocol_unbind_all ( ) ;
exit ( 1 ) ;
}
2007-10-15 12:57:08 -04:00
}
}
2018-11-21 09:48:31 -05:00
if ( nb_oldpids & & ! ( global . mode & MODE_MWORKER_WAIT ) )
2010-08-25 06:58:59 -04:00
nb_oldpids = tell_old_pids ( oldpids_sig ) ;
2006-06-25 20:48:02 -04:00
2019-05-07 11:49:33 -04:00
/* send a SIGTERM to workers who have a too high reloads number */
if ( ( global . mode & MODE_MWORKER ) & & ! ( global . mode & MODE_MWORKER_WAIT ) )
mworker_kill_max_reloads ( SIGTERM ) ;
2006-06-25 20:48:02 -04:00
/* Note that any error at this stage will be fatal because we will not
* be able to restart the old pids .
*/
2019-11-17 09:47:16 -05:00
if ( ( global . mode & ( MODE_MWORKER | MODE_DAEMON ) ) = = 0 )
set_identity ( argv [ 0 ] ) ;
2019-04-15 13:38:50 -04:00
MINOR: capabilities: add cap_sys_admin support
If 'namespace' keyword is used in the backend server settings or/and in the
bind string, it means that haproxy process will call setns() to change its
default namespace to the configured one and then, it will create a
socket in this new namespace. setns() syscall requires CAP_SYS_ADMIN
capability in the process Effective set (see man 2 setns). Otherwise, the
process must be run as root.
To avoid to run haproxy as root, let's add cap_sys_admin capability in the
same way as we already added the support for some other network capabilities.
As CAP_SYS_ADMIN belongs to CAP_SYS_* capabilities type, let's add a separate
flag LSTCHK_SYSADM for it. This flag is set, if the 'namespace' keyword was
found during configuration parsing. The flag may be unset only in
prepare_caps_for_setuid() or in prepare_caps_from_permitted_set(), which
inspect process EUID/RUID and Effective and Permitted capabilities sets.
If system doesn't support Linux capabilities or 'cap_sys_admin' was not set
in 'setcap', but 'namespace' keyword is presented in the configuration, we
keep the previous strict behaviour. Process, that has changed uid to the
non-priviledged user, will terminate with alert. This alert invites the user
to recheck its configuration.
In the case, when haproxy will start and run under a non-root user and
'cap_sys_admin' is not set, but 'namespace' keyword is presented, this patch
does not change previous behaviour as well. We'll still let the user to try
its configuration, but we inform via warning, that unexpected things, like
socket creation errors, may occur.
2024-04-26 15:47:54 -04:00
/* set_identity() above might have dropped LSTCHK_NETADM or/and
* LSTCHK_SYSADM if it changed to a new UID while preserving enough
* permissions to honnor LSTCHK_NETADM / LSTCHK_SYSADM .
BUG/MINOR: init: relax LSTCHK_NETADM checks for non root
Linux capabilities support and ability to preserve it for running process
after switching to a global.uid was added recently by the commit bd84387beb26
("MEDIUM: capabilities: enable support for Linux capabilities")).
This new feature hasn't yet been taken into account by last config checks,
which are performed at initialization stage.
So, to update it, let's perform it after set_identity() call. Like this,
current EUID is already changed to a global.uid and prepare_caps_for_setuid()
would unset LSTCHK_NETADM flag, only if capabilities given in the 'setcap'
keyword in the configuration file were preserved.
Otherwise, if system doesn't support Linux capabilities or they were not set
via 'setcap', we keep the previous strict behaviour: process will terminate
with an alert, in order to insist that user: either needs to change
run UID (worst case: start and run as root), or he needs to set/recheck
capabilities listed as 'setcap' arguments.
In the case, when haproxy will start and run under a non-root user this patch
doesn't change the previous behaviour: we'll still let him try the
configuration, but we inform via warning that unexpected things may occur.
Need to be backported until v2.9, including v2.9.
2024-03-18 09:50:26 -04:00
*/
MINOR: capabilities: add cap_sys_admin support
If 'namespace' keyword is used in the backend server settings or/and in the
bind string, it means that haproxy process will call setns() to change its
default namespace to the configured one and then, it will create a
socket in this new namespace. setns() syscall requires CAP_SYS_ADMIN
capability in the process Effective set (see man 2 setns). Otherwise, the
process must be run as root.
To avoid to run haproxy as root, let's add cap_sys_admin capability in the
same way as we already added the support for some other network capabilities.
As CAP_SYS_ADMIN belongs to CAP_SYS_* capabilities type, let's add a separate
flag LSTCHK_SYSADM for it. This flag is set, if the 'namespace' keyword was
found during configuration parsing. The flag may be unset only in
prepare_caps_for_setuid() or in prepare_caps_from_permitted_set(), which
inspect process EUID/RUID and Effective and Permitted capabilities sets.
If system doesn't support Linux capabilities or 'cap_sys_admin' was not set
in 'setcap', but 'namespace' keyword is presented in the configuration, we
keep the previous strict behaviour. Process, that has changed uid to the
non-priviledged user, will terminate with alert. This alert invites the user
to recheck its configuration.
In the case, when haproxy will start and run under a non-root user and
'cap_sys_admin' is not set, but 'namespace' keyword is presented, this patch
does not change previous behaviour as well. We'll still let the user to try
its configuration, but we inform via warning, that unexpected things, like
socket creation errors, may occur.
2024-04-26 15:47:54 -04:00
if ( ( global . last_checks & ( LSTCHK_NETADM | LSTCHK_SYSADM ) ) & & getuid ( ) ) {
BUG/MINOR: init: relax LSTCHK_NETADM checks for non root
Linux capabilities support and ability to preserve it for running process
after switching to a global.uid was added recently by the commit bd84387beb26
("MEDIUM: capabilities: enable support for Linux capabilities")).
This new feature hasn't yet been taken into account by last config checks,
which are performed at initialization stage.
So, to update it, let's perform it after set_identity() call. Like this,
current EUID is already changed to a global.uid and prepare_caps_for_setuid()
would unset LSTCHK_NETADM flag, only if capabilities given in the 'setcap'
keyword in the configuration file were preserved.
Otherwise, if system doesn't support Linux capabilities or they were not set
via 'setcap', we keep the previous strict behaviour: process will terminate
with an alert, in order to insist that user: either needs to change
run UID (worst case: start and run as root), or he needs to set/recheck
capabilities listed as 'setcap' arguments.
In the case, when haproxy will start and run under a non-root user this patch
doesn't change the previous behaviour: we'll still let him try the
configuration, but we inform via warning that unexpected things may occur.
Need to be backported until v2.9, including v2.9.
2024-03-18 09:50:26 -04:00
/* If global.uid is present in config, it is already set as euid
MINOR: capabilities: add cap_sys_admin support
If 'namespace' keyword is used in the backend server settings or/and in the
bind string, it means that haproxy process will call setns() to change its
default namespace to the configured one and then, it will create a
socket in this new namespace. setns() syscall requires CAP_SYS_ADMIN
capability in the process Effective set (see man 2 setns). Otherwise, the
process must be run as root.
To avoid to run haproxy as root, let's add cap_sys_admin capability in the
same way as we already added the support for some other network capabilities.
As CAP_SYS_ADMIN belongs to CAP_SYS_* capabilities type, let's add a separate
flag LSTCHK_SYSADM for it. This flag is set, if the 'namespace' keyword was
found during configuration parsing. The flag may be unset only in
prepare_caps_for_setuid() or in prepare_caps_from_permitted_set(), which
inspect process EUID/RUID and Effective and Permitted capabilities sets.
If system doesn't support Linux capabilities or 'cap_sys_admin' was not set
in 'setcap', but 'namespace' keyword is presented in the configuration, we
keep the previous strict behaviour. Process, that has changed uid to the
non-priviledged user, will terminate with alert. This alert invites the user
to recheck its configuration.
In the case, when haproxy will start and run under a non-root user and
'cap_sys_admin' is not set, but 'namespace' keyword is presented, this patch
does not change previous behaviour as well. We'll still let the user to try
its configuration, but we inform via warning, that unexpected things, like
socket creation errors, may occur.
2024-04-26 15:47:54 -04:00
* and ruid by set_identity ( ) just above , so it ' s better to
BUG/MINOR: init: relax LSTCHK_NETADM checks for non root
Linux capabilities support and ability to preserve it for running process
after switching to a global.uid was added recently by the commit bd84387beb26
("MEDIUM: capabilities: enable support for Linux capabilities")).
This new feature hasn't yet been taken into account by last config checks,
which are performed at initialization stage.
So, to update it, let's perform it after set_identity() call. Like this,
current EUID is already changed to a global.uid and prepare_caps_for_setuid()
would unset LSTCHK_NETADM flag, only if capabilities given in the 'setcap'
keyword in the configuration file were preserved.
Otherwise, if system doesn't support Linux capabilities or they were not set
via 'setcap', we keep the previous strict behaviour: process will terminate
with an alert, in order to insist that user: either needs to change
run UID (worst case: start and run as root), or he needs to set/recheck
capabilities listed as 'setcap' arguments.
In the case, when haproxy will start and run under a non-root user this patch
doesn't change the previous behaviour: we'll still let him try the
configuration, but we inform via warning that unexpected things may occur.
Need to be backported until v2.9, including v2.9.
2024-03-18 09:50:26 -04:00
* remind the user to fix uncoherent settings .
*/
if ( global . uid ) {
ha_alert ( " [%s.main()] Some configuration options require full "
" privileges, so global.uid cannot be changed. \n " , argv [ 0 ] ) ;
# if defined(USE_LINUX_CAP)
ha_alert ( " [%s.main()] Alternately, if your system supports "
" Linux capabilities, you may also consider using "
" 'setcap cap_net_raw' or 'setcap cap_net_admin' in the "
" 'global' section. \n " , argv [ 0 ] ) ;
# endif
protocol_unbind_all ( ) ;
exit ( 1 ) ;
}
/* If the user is not root, we'll still let them try the configuration
* but we inform them that unexpected behaviour may occur .
*/
ha_warning ( " [%s.main()] Some options which require full privileges "
" might not work well. \n " , argv [ 0 ] ) ;
}
2006-06-25 20:48:02 -04:00
/* check ulimits */
limit . rlim_cur = limit . rlim_max = 0 ;
getrlimit ( RLIMIT_NOFILE , & limit ) ;
if ( limit . rlim_cur < global . maxsock ) {
2019-10-27 15:08:11 -04:00
if ( global . tune . options & GTUNE_STRICT_LIMITS ) {
ha_alert ( " [%s.main()] FD limit (%d) too low for maxconn=%d/maxsock=%d. "
" Please raise 'ulimit-n' to %d or more to avoid any trouble. \n " ,
argv [ 0 ] , ( int ) limit . rlim_cur , global . maxconn , global . maxsock ,
global . maxsock ) ;
2021-01-12 14:19:38 -05:00
exit ( 1 ) ;
2019-10-27 15:08:11 -04:00
}
else
ha_alert ( " [%s.main()] FD limit (%d) too low for maxconn=%d/maxsock=%d. "
2020-03-28 14:29:58 -04:00
" Please raise 'ulimit-n' to %d or more to avoid any trouble. \n " ,
2019-10-27 15:08:11 -04:00
argv [ 0 ] , ( int ) limit . rlim_cur , global . maxconn , global . maxsock ,
global . maxsock ) ;
2006-06-25 20:48:02 -04:00
}
2023-05-23 13:02:08 -04:00
if ( global . prealloc_fd & & fcntl ( ( int ) limit . rlim_cur - 1 , F_GETFD ) = = - 1 ) {
if ( dup2 ( 0 , ( int ) limit . rlim_cur - 1 ) = = - 1 )
2023-05-26 08:04:18 -04:00
ha_warning ( " [%s.main()] Unable to preallocate file descriptor %d : %s " ,
argv [ 0 ] , ( int ) limit . rlim_cur - 1 , strerror ( errno ) ) ;
2023-05-23 13:02:08 -04:00
else
close ( ( int ) limit . rlim_cur - 1 ) ;
}
2023-05-17 03:02:21 -04:00
/* update the ready date a last time to also account for final setup time */
clock_update_date ( 0 , 1 ) ;
2023-05-16 13:19:36 -04:00
clock_adjust_now_offset ( ) ;
2023-05-17 03:02:21 -04:00
ready_date = date ;
2024-07-17 12:40:41 -04:00
/* catch last warnings, which could be produced while adjusting limits
* or preallocating fds
*/
if ( warned & WARN_ANY & & global . mode & MODE_ZERO_WARNING ) {
ha_alert ( " Some warnings were found and 'zero-warning' is set. Aborting. \n " ) ;
exit ( 1 ) ;
}
2018-11-21 09:48:31 -05:00
if ( global . mode & ( MODE_DAEMON | MODE_MWORKER | MODE_MWORKER_WAIT ) ) {
2006-06-25 20:48:02 -04:00
int ret = 0 ;
2021-06-15 01:58:09 -04:00
int in_parent = 0 ;
2017-12-28 10:09:36 -05:00
int devnullfd = - 1 ;
2006-06-25 20:48:02 -04:00
2017-06-01 11:38:50 -04:00
/*
* if daemon + mworker : must fork here to let a master
* process live in background before forking children
*/
2017-06-01 11:38:51 -04:00
if ( ( getenv ( " HAPROXY_MWORKER_REEXEC " ) = = NULL )
& & ( global . mode & MODE_MWORKER )
& & ( global . mode & MODE_DAEMON ) ) {
2017-06-01 11:38:50 -04:00
ret = fork ( ) ;
if ( ret < 0 ) {
2017-11-24 10:50:31 -05:00
ha_alert ( " [%s.main()] Cannot fork. \n " , argv [ 0 ] ) ;
2017-06-01 11:38:50 -04:00
protocol_unbind_all ( ) ;
exit ( 1 ) ; /* there has been an error */
2018-07-04 09:31:23 -04:00
} else if ( ret > 0 ) { /* parent leave to daemonize */
2017-06-01 11:38:50 -04:00
exit ( 0 ) ;
2018-07-04 09:31:23 -04:00
} else /* change the process group ID in the child (master process) */
setsid ( ) ;
2017-06-01 11:38:50 -04:00
}
2017-06-01 11:38:55 -04:00
2017-11-06 05:00:04 -05:00
/* if in master-worker mode, write the PID of the father */
if ( global . mode & MODE_MWORKER ) {
char pidstr [ 100 ] ;
2019-06-22 01:41:38 -04:00
snprintf ( pidstr , sizeof ( pidstr ) , " %d \n " , ( int ) getpid ( ) ) ;
2018-01-23 13:20:19 -05:00
if ( pidfd > = 0 )
2020-03-14 06:03:20 -04:00
DISGUISE ( write ( pidfd , pidstr , strlen ( pidstr ) ) ) ;
2017-11-06 05:00:04 -05:00
}
2006-06-25 20:48:02 -04:00
/* the father launches the required number of processes */
2018-11-21 09:48:31 -05:00
if ( ! ( global . mode & MODE_MWORKER_WAIT ) ) {
2022-09-26 06:54:39 -04:00
struct ring * tmp_startup_logs = NULL ;
2019-04-01 05:30:02 -04:00
if ( global . mode & MODE_MWORKER )
mworker_ext_launch_all ( ) ;
2021-06-15 01:58:09 -04:00
2022-09-26 06:54:39 -04:00
/* at this point the worker must have his own startup_logs buffer */
tmp_startup_logs = startup_logs_dup ( startup_logs ) ;
2021-06-15 01:58:09 -04:00
ret = fork ( ) ;
if ( ret < 0 ) {
ha_alert ( " [%s.main()] Cannot fork. \n " , argv [ 0 ] ) ;
protocol_unbind_all ( ) ;
exit ( 1 ) ; /* there has been an error */
}
else if ( ret = = 0 ) { /* child breaks here */
2022-09-26 06:54:39 -04:00
startup_logs_free ( startup_logs ) ;
startup_logs = tmp_startup_logs ;
2021-07-21 04:17:02 -04:00
/* This one must not be exported, it's internal! */
unsetenv ( " HAPROXY_MWORKER_REEXEC " ) ;
2021-06-15 03:08:18 -04:00
ha_random_jump96 ( 1 ) ;
2021-06-15 01:58:09 -04:00
}
else { /* parent here */
in_parent = 1 ;
2018-11-21 09:48:31 -05:00
if ( pidfd > = 0 & & ! ( global . mode & MODE_MWORKER ) ) {
char pidstr [ 100 ] ;
snprintf ( pidstr , sizeof ( pidstr ) , " %d \n " , ret ) ;
2020-03-14 06:03:20 -04:00
DISGUISE ( write ( pidfd , pidstr , strlen ( pidstr ) ) ) ;
2018-11-21 09:48:31 -05:00
}
if ( global . mode & MODE_MWORKER ) {
struct mworker_proc * child ;
2021-11-09 09:25:31 -05:00
ha_notice ( " New worker (%d) forked \n " , ret ) ;
2018-11-21 09:48:31 -05:00
/* find the right mworker_proc */
list_for_each_entry ( child , & proc_list , list ) {
2022-07-20 18:52:43 -04:00
if ( child - > reloads = = 0 & &
child - > options & PROC_O_TYPE_WORKER & &
child - > pid = = - 1 ) {
2023-02-17 10:23:52 -05:00
child - > timestamp = date . tv_sec ;
2018-11-21 09:48:31 -05:00
child - > pid = ret ;
2019-06-12 13:11:33 -04:00
child - > version = strdup ( haproxy_version ) ;
2023-06-21 03:44:18 -04:00
/* at this step the fd is bound for the worker, set it to -1 so
* it could be close in case of errors in mworker_cleanup_proc ( ) */
child - > ipc_fd [ 1 ] = - 1 ;
2018-11-21 09:48:31 -05:00
break ;
}
2018-10-26 08:47:30 -04:00
}
}
2018-11-21 09:48:31 -05:00
}
2021-06-15 01:58:09 -04:00
2018-11-21 09:48:31 -05:00
} else {
/* wait mode */
2021-06-15 01:58:09 -04:00
in_parent = 1 ;
2006-06-25 20:48:02 -04:00
}
2012-11-16 10:12:27 -05:00
2006-06-25 20:48:02 -04:00
/* close the pidfile both in children and father */
2012-09-05 02:02:48 -04:00
if ( pidfd > = 0 ) {
//lseek(pidfd, 0, SEEK_SET); /* debug: emulate eglibc bug */
close ( pidfd ) ;
}
2010-08-25 06:49:05 -04:00
/* We won't ever use this anymore */
2021-02-20 04:46:51 -05:00
ha_free ( & global . pidfile ) ;
2006-06-25 20:48:02 -04:00
2021-06-15 01:58:09 -04:00
if ( in_parent ) {
2018-11-21 09:48:31 -05:00
if ( global . mode & ( MODE_MWORKER | MODE_MWORKER_WAIT ) ) {
2021-11-09 12:01:22 -05:00
master = 1 ;
2017-11-28 17:26:08 -05:00
if ( ( ! ( global . mode & MODE_QUIET ) | | ( global . mode & MODE_VERBOSE ) ) & &
( global . mode & MODE_DAEMON ) ) {
/* detach from the tty, this is required to properly daemonize. */
2017-12-28 10:09:36 -05:00
if ( ( getenv ( " HAPROXY_MWORKER_REEXEC " ) = = NULL ) )
stdio_quiet ( - 1 ) ;
2017-11-28 17:26:08 -05:00
global . mode & = ~ MODE_VERBOSE ;
global . mode | = MODE_QUIET ; /* ensure that we won't say anything from now */
}
2021-11-09 12:01:22 -05:00
if ( global . mode & MODE_MWORKER_WAIT ) {
/* only the wait mode handles the master CLI */
mworker_loop ( ) ;
} else {
2022-07-07 08:00:36 -04:00
# if defined(USE_SYSTEMD)
if ( global . tune . options & GTUNE_USE_SYSTEMD )
sd_notifyf ( 0 , " READY=1 \n MAINPID=%lu \n STATUS=Ready. \n " , ( unsigned long ) getpid ( ) ) ;
# endif
2021-11-09 12:01:22 -05:00
/* if not in wait mode, reload in wait mode to free the memory */
2022-09-24 09:44:42 -04:00
setenv ( " HAPROXY_LOAD_SUCCESS " , " 1 " , 1 ) ;
2021-11-09 12:16:47 -05:00
ha_notice ( " Loading success. \n " ) ;
2021-11-10 04:49:06 -05:00
proc_self - > failedreloads = 0 ; /* reset the number of failure */
2021-11-09 12:01:22 -05:00
mworker_reexec_waitmode ( ) ;
}
2017-06-07 09:04:47 -04:00
/* should never get there */
exit ( EXIT_FAILURE ) ;
2015-05-01 11:01:08 -04:00
}
2017-06-08 13:05:48 -04:00
# if defined(USE_OPENSSL) && !defined(OPENSSL_NO_DH)
2017-01-20 20:10:18 -05:00
ssl_free_dh ( ) ;
# endif
2017-06-07 09:04:47 -04:00
exit ( 0 ) ; /* parent must leave */
2015-05-01 11:01:08 -04:00
}
2017-06-01 11:38:52 -04:00
/* child must never use the atexit function */
atexit_flag = 0 ;
2018-09-11 04:06:26 -04:00
/* close useless master sockets */
if ( global . mode & MODE_MWORKER ) {
struct mworker_proc * child , * it ;
master = 0 ;
2018-10-26 08:47:45 -04:00
mworker_cli_proxy_stop ( ) ;
2018-09-11 04:06:26 -04:00
/* free proc struct of other processes */
list_for_each_entry_safe ( child , it , & proc_list , list ) {
2018-10-26 08:47:30 -04:00
/* close the FD of the master side for all
* workers , we don ' t need to close the worker
* side of other workers since it ' s done with
* the bind_proc */
2022-01-28 15:56:24 -05:00
if ( child - > ipc_fd [ 0 ] > = 0 ) {
2018-11-25 14:03:39 -05:00
close ( child - > ipc_fd [ 0 ] ) ;
2022-01-28 15:56:24 -05:00
child - > ipc_fd [ 0 ] = - 1 ;
}
2021-06-15 03:08:18 -04:00
if ( child - > options & PROC_O_TYPE_WORKER & &
2022-07-20 18:52:43 -04:00
child - > reloads = = 0 & &
child - > pid = = - 1 ) {
2018-10-26 08:47:30 -04:00
/* keep this struct if this is our pid */
proc_self = child ;
2018-09-11 04:06:26 -04:00
continue ;
2018-10-26 08:47:30 -04:00
}
2021-04-21 01:32:39 -04:00
LIST_DELETE ( & child - > list ) ;
2019-05-16 14:23:22 -04:00
mworker_free_child ( child ) ;
child = NULL ;
2018-09-11 04:06:26 -04:00
}
}
BUG/MEDIUM: threads/mworker: fix a race on startup
Marc Fournier reported an interesting case when using threads with the
master-worker mode : sometimes, a listener would have its FD closed
during startup. Sometimes it could even be health checks seeing this.
What happens is that after the threads are created, and the pollers
enabled on each threads, the master-worker pipe is registered, and at
the same time a close() is performed on the write side of this pipe
since the children must not use it.
But since this is replicated in every thread, what happens is that the
first thread closes the pipe, thus releases the FD, and the next thread
starting a listener in parallel gets this FD reassigned. Then another
thread closes the FD again, which this time corresponds to the listener.
It can also happen with the health check sockets if they're started
early enough.
This patch splits the mworker_pipe_register() function in two, so that
the close() of the write side of the FD is performed very early after the
fork() and long before threads are created (we don't need to delay it
anyway). Only the pipe registration is done in the threaded code since
it is important that the pollers are properly allocated for this.
The mworker_pipe_register() function now takes care of registering the
pipe only once, and this is guaranteed by a new surrounding lock.
The call to protocol_enable_all() looks fragile in theory since it
scans the list of proxies and their listeners, though in practice
all threads scan the same list and take the same locks for each
listener so it's not possible that any of them escapes the process
and finishes before all listeners are started. And the operation is
idempotent.
This fix must be backported to 1.8. Thanks to Marc for providing very
detailed traces clearly showing the problem.
2018-01-23 13:01:49 -05:00
2017-12-28 10:09:36 -05:00
if ( ! ( global . mode & MODE_QUIET ) | | ( global . mode & MODE_VERBOSE ) ) {
devnullfd = open ( " /dev/null " , O_RDWR , 0 ) ;
if ( devnullfd < 0 ) {
ha_alert ( " Cannot open /dev/null \n " ) ;
exit ( EXIT_FAILURE ) ;
}
}
2017-06-01 11:38:50 -04:00
/* Must chroot and setgid/setuid in the children */
/* chroot if needed */
if ( global . chroot ! = NULL ) {
if ( chroot ( global . chroot ) = = - 1 | | chdir ( " / " ) = = - 1 ) {
2021-06-15 02:59:19 -04:00
ha_alert ( " [%s.main()] Cannot chroot(%s). \n " , argv [ 0 ] , global . chroot ) ;
2017-06-01 11:38:50 -04:00
if ( nb_oldpids )
tell_old_pids ( SIGTTIN ) ;
protocol_unbind_all ( ) ;
exit ( 1 ) ;
}
}
2021-02-20 04:46:51 -05:00
ha_free ( & global . chroot ) ;
2019-11-17 09:47:16 -05:00
set_identity ( argv [ 0 ] ) ;
2017-06-01 11:38:50 -04:00
2018-11-13 10:18:23 -05:00
/*
* This is only done in daemon mode because we might want the
* logs on stdout in mworker mode . If we ' re NOT in QUIET mode ,
* we should now close the 3 first FDs to ensure that we can
* detach from the TTY . We MUST NOT do it in other cases since
* it would have already be done , and 0 - 2 would have been
* affected to listening sockets
2006-06-25 20:48:02 -04:00
*/
2018-11-13 10:18:23 -05:00
if ( ( global . mode & MODE_DAEMON ) & &
( ! ( global . mode & MODE_QUIET ) | | ( global . mode & MODE_VERBOSE ) ) ) {
2006-06-25 20:48:02 -04:00
/* detach from the tty */
2017-12-28 10:09:36 -05:00
stdio_quiet ( devnullfd ) ;
2008-11-16 01:40:34 -05:00
global . mode & = ~ MODE_VERBOSE ;
2006-06-25 20:48:02 -04:00
global . mode | = MODE_QUIET ; /* ensure that we won't say anything from now */
}
pid = getpid ( ) ; /* update child's pid */
2018-07-04 09:31:23 -04:00
if ( ! ( global . mode & MODE_MWORKER ) ) /* in mworker mode we don't want a new pgid for the children */
setsid ( ) ;
2007-04-09 13:29:56 -04:00
fork_poller ( ) ;
2006-06-25 20:48:02 -04:00
}
2023-11-20 04:49:05 -05:00
/* pass through every cli socket, and check if it's bound to
* the current process and if it exposes listeners sockets .
* Caution : the GTUNE_SOCKET_TRANSFER is now set after the fork .
* */
if ( global . cli_fe ) {
struct bind_conf * bind_conf ;
list_for_each_entry ( bind_conf , & global . cli_fe - > conf . bind , by_fe ) {
if ( bind_conf - > level & ACCESS_FD_LISTENERS ) {
global . tune . options | = GTUNE_SOCKET_TRANSFER ;
break ;
}
}
}
2023-07-19 12:39:32 -04:00
/* Note that here we can't be in the parent/master anymore */
# if !defined(USE_THREAD) && defined(USE_CPU_AFFINITY)
if ( ha_cpuset_count ( & cpu_map [ 0 ] . thread [ 0 ] ) ) { /* only do this if the process has a CPU map */
# if defined(CPUSET_USE_CPUSET) || defined(__DragonFly__)
struct hap_cpuset * set = & cpu_map [ 0 ] . thread [ 0 ] ;
sched_setaffinity ( 0 , sizeof ( set - > cpuset ) , & set - > cpuset ) ;
# elif defined(__FreeBSD__)
struct hap_cpuset * set = & cpu_map [ 0 ] . thread [ 0 ] ;
ret = cpuset_setaffinity ( CPU_LEVEL_WHICH , CPU_WHICH_PID , - 1 , sizeof ( set - > cpuset ) , & set - > cpuset ) ;
# endif
}
# endif
2019-11-17 09:47:15 -05:00
/* try our best to re-enable core dumps depending on system capabilities.
* What is addressed here :
* - remove file size limits
* - remove core size limits
* - mark the process dumpable again if it lost it due to user / group
*/
if ( global . tune . options & GTUNE_SET_DUMPABLE ) {
limit . rlim_cur = limit . rlim_max = RLIM_INFINITY ;
# if defined(RLIMIT_FSIZE)
if ( setrlimit ( RLIMIT_FSIZE , & limit ) = = - 1 ) {
if ( global . tune . options & GTUNE_STRICT_LIMITS ) {
ha_alert ( " [%s.main()] Failed to set the raise the maximum "
" file size. \n " , argv [ 0 ] ) ;
2021-01-12 14:19:38 -05:00
exit ( 1 ) ;
2019-11-17 09:47:15 -05:00
}
else
ha_warning ( " [%s.main()] Failed to set the raise the maximum "
2020-03-28 14:29:58 -04:00
" file size. \n " , argv [ 0 ] ) ;
2019-11-17 09:47:15 -05:00
}
# endif
# if defined(RLIMIT_CORE)
if ( setrlimit ( RLIMIT_CORE , & limit ) = = - 1 ) {
if ( global . tune . options & GTUNE_STRICT_LIMITS ) {
ha_alert ( " [%s.main()] Failed to set the raise the core "
" dump size. \n " , argv [ 0 ] ) ;
2021-01-12 14:19:38 -05:00
exit ( 1 ) ;
2019-11-17 09:47:15 -05:00
}
else
ha_warning ( " [%s.main()] Failed to set the raise the core "
2020-03-28 14:29:58 -04:00
" dump size. \n " , argv [ 0 ] ) ;
2019-11-17 09:47:15 -05:00
}
# endif
# if defined(USE_PRCTL)
if ( prctl ( PR_SET_DUMPABLE , 1 , 0 , 0 , 0 ) = = - 1 )
ha_warning ( " [%s.main()] Failed to set the dumpable flag, "
" no core will be dumped. \n " , argv [ 0 ] ) ;
2021-08-21 04:13:10 -04:00
# elif defined(USE_PROCCTL)
2021-10-08 09:55:13 -04:00
{
int traceable = PROC_TRACE_CTL_ENABLE ;
if ( procctl ( P_PID , getpid ( ) , PROC_TRACE_CTL , & traceable ) = = - 1 )
ha_warning ( " [%s.main()] Failed to set the traceable flag, "
" no core will be dumped. \n " , argv [ 0 ] ) ;
}
2019-11-17 09:47:15 -05:00
# endif
}
2017-10-24 07:53:54 -04:00
global . mode & = ~ MODE_STARTING ;
2021-05-27 09:45:28 -04:00
reset_usermsgs_ctx ( ) ;
2021-09-28 04:36:57 -04:00
/* start threads 2 and above */
2021-10-06 16:22:40 -04:00
setup_extra_threads ( & run_thread_poll_loop ) ;
2018-06-07 03:46:01 -04:00
2021-09-28 04:36:57 -04:00
/* when multithreading we need to let only the thread 0 handle the signals */
2018-09-11 04:06:23 -04:00
haproxy_unblock_signals ( ) ;
2021-09-28 04:36:57 -04:00
/* Finally, start the poll loop for the first thread */
2021-09-28 03:43:11 -04:00
run_thread_poll_loop ( & ha_thread_info [ 0 ] ) ;
2021-09-28 04:36:57 -04:00
/* wait for all threads to terminate */
wait_for_threads_completion ( ) ;
2017-10-16 09:49:32 -04:00
MINOR: haproxy: Make use of deinit_and_exit() for clean exits
Particularly cleanly deinit() after a configuration check to clean up the
output of valgrind which reports "possible losses" without a deinit() and
does not with a deinit(), converting actual losses into proper hard losses
which makes the whole stuff easier to analyze.
As an example, given an example configuration of the following:
frontend foo
bind *:8080
mode http
Running `haproxy -c -f cfg` within valgrind will report 4 possible losses:
$ valgrind --leak-check=full ./haproxy -c -f ./example.cfg
==21219== Memcheck, a memory error detector
==21219== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==21219== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==21219== Command: ./haproxy -c -f ./example.cfg
==21219==
[WARNING] 165/001100 (21219) : config : missing timeouts for frontend 'foo'.
| While not properly invalid, you will certainly encounter various problems
| with such a configuration. To fix this, please ensure that all following
| timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Warnings were found.
Configuration file is valid
==21219==
==21219== HEAP SUMMARY:
==21219== in use at exit: 1,436,631 bytes in 130 blocks
==21219== total heap usage: 153 allocs, 23 frees, 1,447,758 bytes allocated
==21219==
==21219== 7 bytes in 1 blocks are possibly lost in loss record 5 of 54
==21219== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21219== by 0x5726489: strdup (strdup.c:42)
==21219== by 0x468FD9: bind_conf_alloc (listener.h:158)
==21219== by 0x468FD9: cfg_parse_listen (cfgparse-listen.c:557)
==21219== by 0x459DF3: readcfgfile (cfgparse.c:2167)
==21219== by 0x5056CD: init (haproxy.c:2021)
==21219== by 0x418232: main (haproxy.c:3121)
==21219==
==21219== 14 bytes in 1 blocks are possibly lost in loss record 9 of 54
==21219== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21219== by 0x5726489: strdup (strdup.c:42)
==21219== by 0x468F9B: bind_conf_alloc (listener.h:154)
==21219== by 0x468F9B: cfg_parse_listen (cfgparse-listen.c:557)
==21219== by 0x459DF3: readcfgfile (cfgparse.c:2167)
==21219== by 0x5056CD: init (haproxy.c:2021)
==21219== by 0x418232: main (haproxy.c:3121)
==21219==
==21219== 128 bytes in 1 blocks are possibly lost in loss record 35 of 54
==21219== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21219== by 0x468F90: bind_conf_alloc (listener.h:152)
==21219== by 0x468F90: cfg_parse_listen (cfgparse-listen.c:557)
==21219== by 0x459DF3: readcfgfile (cfgparse.c:2167)
==21219== by 0x5056CD: init (haproxy.c:2021)
==21219== by 0x418232: main (haproxy.c:3121)
==21219==
==21219== 608 bytes in 1 blocks are possibly lost in loss record 46 of 54
==21219== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21219== by 0x4B953A: create_listeners (listener.c:576)
==21219== by 0x4578F6: str2listener (cfgparse.c:192)
==21219== by 0x469039: cfg_parse_listen (cfgparse-listen.c:568)
==21219== by 0x459DF3: readcfgfile (cfgparse.c:2167)
==21219== by 0x5056CD: init (haproxy.c:2021)
==21219== by 0x418232: main (haproxy.c:3121)
==21219==
==21219== LEAK SUMMARY:
==21219== definitely lost: 0 bytes in 0 blocks
==21219== indirectly lost: 0 bytes in 0 blocks
==21219== possibly lost: 757 bytes in 4 blocks
==21219== still reachable: 1,435,874 bytes in 126 blocks
==21219== suppressed: 0 bytes in 0 blocks
==21219== Reachable blocks (those to which a pointer was found) are not shown.
==21219== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==21219==
==21219== For counts of detected and suppressed errors, rerun with: -v
==21219== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
Re-running the same command with the patch applied will not report any
losses any more:
$ valgrind --leak-check=full ./haproxy -c -f ./example.cfg
==22124== Memcheck, a memory error detector
==22124== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==22124== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==22124== Command: ./haproxy -c -f ./example.cfg
==22124==
[WARNING] 165/001503 (22124) : config : missing timeouts for frontend 'foo'.
| While not properly invalid, you will certainly encounter various problems
| with such a configuration. To fix this, please ensure that all following
| timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Warnings were found.
Configuration file is valid
==22124==
==22124== HEAP SUMMARY:
==22124== in use at exit: 313,864 bytes in 82 blocks
==22124== total heap usage: 153 allocs, 71 frees, 1,447,758 bytes allocated
==22124==
==22124== LEAK SUMMARY:
==22124== definitely lost: 0 bytes in 0 blocks
==22124== indirectly lost: 0 bytes in 0 blocks
==22124== possibly lost: 0 bytes in 0 blocks
==22124== still reachable: 313,864 bytes in 82 blocks
==22124== suppressed: 0 bytes in 0 blocks
==22124== Reachable blocks (those to which a pointer was found) are not shown.
==22124== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==22124==
==22124== For counts of detected and suppressed errors, rerun with: -v
==22124== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
It might be worth investigating what exactly HAProxy does to lose pointers
to the start of those 4 memory areas and then to be able to still free them
during deinit(). If HAProxy is able to free them, they ideally should be
"still reachable" and not "possibly lost".
2020-06-13 18:37:42 -04:00
deinit_and_exit ( 0 ) ;
2006-06-25 20:48:02 -04:00
}
/*
* Local variables :
* c - indent - level : 8
* c - basic - offset : 8
* End :
*/