Commit graph

12666 commits

Author SHA1 Message Date
Ondřej Kuzník
0abf3f5bc9 Flush cache before calling dispose()
This needs to be confirmed:
Location based atomics do not imply a full fence of the same level. So
to get the code in dispose() read the actual data, it seems we need to
initiate a fence.
2020-11-17 17:58:15 +00:00
Ondřej Kuzník
dfbed44b3e Do not accept requests with msgid == 0
It is used internally to identify pinned operations and should not be
encountered over the wire.
2020-11-17 17:58:15 +00:00
Ondřej Kuzník
dfbf25d579 Honour keepalive settings for upstreams 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
25fff30e39 Let the last thread dispose of pending references
If we're idle, there might be objects pending cleanup for the last two
epochs. Unless another thread comes in and checks into a new epoch or we
shut down, they will linger forever.

If one of the objects was a connection, it wouldn't get closed and be
stuck in CLOSE_WAIT state, potentially refusing another ligitimate
connection if its socket address were to match the one we're yet to
close.
2020-11-17 17:58:15 +00:00
Ondřej Kuzník
41a74b4689 Introduce the notion of experimental features 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
1f6d8611a3 Implement read throttling when writes backlog
Reject operations in such a case with LDAP_BUSY. If read_event feature
is on, just stop reading from the connection. However this could still
result in deadlocks in reasonable situations. Need to figure out better
ways to make it safe and still protect ourselves.
2020-11-17 17:58:15 +00:00
Ondřej Kuzník
68b163fca9 Introduce mutex checks
Switched off unless thread debugging is on, but still useful for static
analysis.
2020-11-17 17:58:15 +00:00
Ondřej Kuzník
62a806b243 Thread error checking 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
1328777a85 Fix a SASL channel-binding leak 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
58d66a3946 Fix race between unlinking a client and processing incoming data 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
959ff07911 Make sure read event is not enabled while upstream_bind is scheduled 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
b2e57148fa Shorten to one epoch per PDU
A full read cycle can take a very long time if the limits are set too
high.
2020-11-17 17:58:15 +00:00
Ondřej Kuzník
b49f51879f Implement client pending operation limits 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
f832024e90 Straighten up client pending op tracking 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
dc1961cb15 Epoch based memory reclamation
Similar to the algorithm presented in
https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-579.pdf

Not completely lock-free at the moment. Also the problems with epoch
based memory reclamation are still present - a thread actively observing
an epoch getting stuck will prevent LloadConnections and LloadOperations
being freed, potentially running out of memory.
2020-11-17 17:58:15 +00:00
Ondřej Kuzník
aab6af1c4e Switch to LDAP_OTHER when handling a lost upstream.
LDAP_UNAVAILABLE signals "the server is shutting down or a subsystem
necessary to complete the operation is offline", so intelligent clients
tend to infer the connection will not be usable any more, which is not
the case here.
2020-11-17 17:58:15 +00:00
Ondřej Kuzník
81ead4a5f4 Fix races with backend_retry 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
78f25a3c91 A failed cn=config ADD needs to be handled 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
4b3d21146b Introduce SASL support for upstream connections 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
05e0906f8b Fix backend starttls= setting being ignored 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
9444dfc991 Simplify pause handling
Gets rid of a race where unpause+pause fired in a quick succession would
miss the event_base_loopbreak() call.
2020-11-17 17:58:15 +00:00
Ondřej Kuzník
25a4d684fc Permit lloadd to share slapd TLS context 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
63efcd63eb Reuse connection walking in monitor for upstreams too 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
3bd2d7483e Reuse connection_walk for client matters 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
b4f43ed8e1 Refactor backend reset
Reuse the connection walking facility in timeout management.
2020-11-17 17:58:15 +00:00
Ondřej Kuzník
638f8a2cbc Tighten checks on retry management 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
2a813cb06d Clean up backend_retry and its callers. 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
241f65b9e0 Fix a race in managing b_dns_req 2020-11-17 17:58:15 +00:00
Nadezhda Ivanova
f4a2fdd400 Fix a new backend not being operational if added via cn=config 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
00806dd32a libevent 2.0 support 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
f1ea9da3a0 Reorganise listener support in cn=config and module startup 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
bd7a6f67de Introduce lload_open_new_listener 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
757c8beda7 Switch to ldap_parse_url_ext
This simplifies port parsing in the end. Also pass the url to
ldap_open_listener in anticipation of incremental listener config.
2020-11-17 17:58:15 +00:00
Ondřej Kuzník
93d20459f1 Make io-threads modification startup-only 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
db3961f489 Record connect task to allow canceling it 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
6b10c2988e Record pending DNS resolution to be able to cancel 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
b039e7c1b0 Keep a reference around for the bind task 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
0314f95d7f Work around libevent base not waking up on shutdown 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
db939eeb86 Protect operation when abandoning 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
07401e5829 Implement runtime monitor (un)registration
Unregistration is a hack and we shoould either make the subsystems into
an entry (if monitor allows subentry generation) or implement subsystem
unregistration in back-monitor.
2020-11-17 17:58:15 +00:00
Ondřej Kuzník
1ea5ee1f01 Do not unlock upstream without referencing its dying ops 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
b1c098ad76 Module shutdown support 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
05d6aae40f Rework lloadd startup 2020-11-17 17:58:15 +00:00
Ondřej Kuzník
362f16479a Deal with no backends being configured 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
4c355deb3d Record the backend name 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
3a6b399580 Reflect backend URI change in cn=monitor 2020-11-17 17:58:14 +00:00
Nadezhda Ivanova
bace795984 Enable dynamic configuration 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
70ae4af60a Fix interaction of graceful connection closing and SASL bind support 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
d954216f93 Change log level for unsolicited response 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
edfb3d73d6 Fix operation status tracking.
An operation is rejected iff it has to be dropped before we can find an
upstream for it (unless we handle it ourselves, that is). At that point
it is failed unless completed successfully.

This makes a difference for multi-stage binds which alternate between
'failed' (we are waiting on a server response) and 'completed' (server
did what we asked them to, waiting on client to continue).
2020-11-17 17:58:14 +00:00
Ondřej Kuzník
cfe9065824 Introduce infra to handle config changes 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
a7f8f58a63 expose task functions for invalidation 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
bf9f99dd88 Split backend destruction from resetting it 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
7f22bac4ac Introduce a new connection status - gentle shutdown 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
ca646cd02d Fix operation counts
Trying to abandon an operation does not automatically make it completed,
it might have failed already but we're just racing to reach the client
to record that.
2020-11-17 17:58:14 +00:00
Ondřej Kuzník
bea9bfb33d Move op counting to operation_init 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
0011684760 Cleanup sasl_bind_mech resets 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
9bd90a741c Fix a race on bind response processing.
During response processing, an upstream connection could be marked ready
after a different bind had already been allocated to it, thus allowing
two binds to be in progress on the same connection.
2020-11-17 17:58:14 +00:00
Ondřej Kuzník
485a169758 Implement pause handlers 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
db5966f60d More meaningful connection type reporting 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
22818e8583 Module shutdown 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
dab9054794 Rework monitor startup
Takes care of dealing with monitor not present/not configured and fix a
monitor startup issue.
2020-11-17 17:58:14 +00:00
Nadezhda Ivanova
678fa100f7 Convert the load balancer into a backend 2020-11-17 17:58:14 +00:00
Nadezhda Ivanova
7771606984 Use slapd's config.h 2020-11-17 17:58:14 +00:00
Nadezhda Ivanova
2d33032504 Lload cn=monitor initial implementation 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
7a69017f6f Resolve authzid after a successful auth 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
c957bb9199 Add SASL documentation on SASL handling 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
205db0bf94 Reset pin on simple bind 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
cbc0ec04c0 Fix pinned operation forwarding 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
2ba833680f Operation abandon related fixes 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
5c1245de06 Manage c_sasl_bind_mech on upstream 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
c52328f63d Clear c_auth on every bind request
For a new bind request, this is obvious, for SASL bind requests, we do
not know the final identity until we have finished handling it, make
sure it stays empty until then.
2020-11-17 17:58:14 +00:00
Ondřej Kuzník
72ca711271 Do not compare c_auth when NULL 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
ee893ae147 Handle EXTERNAL mechanism
Will only try to extract the TLS client certificate name if used during
the last handshake.
2020-11-17 17:58:14 +00:00
Ondřej Kuzník
003a35c62f SASL bind support
Introduces pinned operations. When SASL bind finishes, we might still
have to maintain a link between the client an an upstream for future
bind operations if we got a SASL Bind in Progress result code. We zero
out the msgids and remember a server-unique identifer on the client and
the relevant operation that lets us retrieve that link again. This
operation is reclaimed just like anything else when connections drop.

Hopefully, this should work for LDAP TXN and VC Exop support with SASL
later as well since it allows for many-to-many links to exist.
2020-11-17 17:58:14 +00:00
Ondřej Kuzník
21a22d1bf1 Refactor request parsing and sending.
We have to do most of out processing before we send the request over to
the upstream. If we don't, we might be too late and the response might
have arrived already.
2020-11-17 17:58:14 +00:00
Ondřej Kuzník
ddd1acc327 Passing the client directly will allow clearing it from op 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
1fd7249f8e RFC4511 says Binds do not abandon, send a "reset" bind instead 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
66f06f3fa9 Initial extension to upstream selection 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
c91d61cf19 Do not copy files from slapd, just link them 2020-11-17 17:58:14 +00:00
Nadezhda Ivanova
37cd5f21d5 Enable compilation of the load balancer as a module
To compile the balancer as a slapd module, pass --enable-balancer=mod to ./configure
Use --enable-balancer(=yes) to compile as standalone server.
2020-11-17 17:58:14 +00:00
Nadezhda Ivanova
8bc7650a7c Clean ups and renames to coexist with slapd 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
ea83627929 request_abandon RFC4511 conformance 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
5cbd30ded9 Log timed out connections more clearly 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
c386d527ca Protect currently impossible branch 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
aecc62c08e Introduce operation timeout machinery 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
8ba44630ef Factor out abandon message preparation 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
1790018488 Record operation activity times 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
a0ec50b33d Upstream queues ordered by c_connid
In preparation for operation timeout events.
2020-11-17 17:58:14 +00:00
Ondřej Kuzník
0cfd4fca4d Make timeouts common and redo connection read timeouts 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
b4d7e8af8d We should just be able to call backend_retry 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
f87127dfa2 Set up TLS context for backends 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
1b46f86627 Client TLS support 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
a0cd41ecd2 Upstream TLS support 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
063981a06d Respond to timeout events properly 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
ccf75c96c4 Update write timeout to timeval 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
5ee4b67673 Move bind handling to bind.c 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
abab7e46ad Move client related functions to client.c 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
f27517af95 Rename bind handlers 2020-11-17 17:58:14 +00:00
Ondřej Kuzník
b801ca17cb Rename macros and symbols to lloadd 2020-11-17 17:58:14 +00:00