Commit graph

206 commits

Author SHA1 Message Date
Brad Davidson
a8f0acbe52 Add CLI flag and config file for s3 bucket lookup type
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-05-07 11:50:22 -07:00
Brad Davidson
f90334e207 Fix etcd socket option config
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-04-11 13:39:44 -07:00
Brad Davidson
9deef77eef Add ReusePort/ReuseAddr flags to etcd config
Addresses flakes in etcd CI due to the port still being in TIME_WAIT after the server is shut down between tests

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-04-08 15:27:19 -07:00
Brad Davidson
a897f6875e Fix flakey etcd startup tests
Increase etcd shutdown delay to avoid "bind: address already in use" errors seen in CI. Also uses test TmpDir to ensure dir is cleaned up between tests.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-04-02 09:01:26 -07:00
Brad Davidson
0eeac6a622 Rework mock executor using gomock for call validation
Generate the mock executor with mockgen and convert existing uses of the mock executor to set it up properly.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-31 17:09:43 -07:00
Brad Davidson
d45006be66 Move etcd ready channel into executor
This eliminates the final channel that was being passed around in an internal struct. The ETCD management code passes in a func that can be polled until etcd is ready; the executor is responsible for polling this after etcd is started and closing the etcd ready channel at the correct time.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-24 12:42:29 -07:00
Brad Davidson
72bbd676f1 Fix etcd tests to use mock executor
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-24 12:42:29 -07:00
Brad Davidson
a8bc412422 Move container runtime ready channel into executor
Move the container runtime ready channel into the executor interface, instead of passing it awkwardly between server and agent config structs

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-24 12:42:29 -07:00
Brad Davidson
d694dd1db9 Add periodic background snapshot reconcile
Interval is configurable with new etcd-snapshot-reconcile-interval flag

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-05 12:18:19 -08:00
Brad Davidson
bed1f66880 Avoid use of github.com/pkg/errors functions that capture stack
We are not making use of the stack traces that these functions capture, so we should avoid using them as unnecessary overhead.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-05 00:41:38 -08:00
Brad Davidson
f940368747 Use etcd proxy to bootstrap control-plane-only nodes, if possible
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-02-27 11:19:26 -08:00
Brad Davidson
5894af30ff Move CR APIs to k3s-io/api
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-02-24 11:17:27 -08:00
Brad Davidson
6199b79f4b Add etcd snapshot metrics
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-02-18 11:09:42 -08:00
Brad Davidson
4cacf6e1c0 Make etcd test linux-only
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-02-07 07:46:19 -08:00
Brad Davidson
0d028a2283 Add support for AWS shared credentials file
Also adds a CLI flag and fields for session token, which must be passed
alongside the access key and secret when using temporary credentials.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-01-29 00:45:56 -08:00
Brad Davidson
fd8348324d Disable s3 transport transparent compression/decompression
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-01-27 11:00:01 -08:00
Brad Davidson
8f8cfb56b5 Move core/v1 mock into tests package for reuse
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-01-09 00:51:19 -08:00
Brad Davidson
f8271d8506 Add test for join existing cluster
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-01-09 00:51:19 -08:00
Brad Davidson
365372441b Handle cluster join as create if we're the only member
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-01-09 00:51:19 -08:00
Brad Davidson
6381ae93e7 Switch to using kubelet config files instead of CLI args
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-20 14:41:40 -08:00
Hussein Galal
763188d642
V1.32.0+k3s1 (#11478)
* Update libraries and codegen for k8s 1.32

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* Fixes for 1.32

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* Disable tests with down-rev agents

These are broken by AuthorizeNodeWithSelectors being on by default. All
agents must be upgraded to v1.32 or newer to work properly, until we
backport RBAC changes to older branches.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

---------

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Co-authored-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-20 23:17:14 +02:00
Brad Davidson
911ee19a93 Refactor load balancer server list and health checking
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-06 11:45:34 -08:00
Brad Davidson
67fd5fa9e5 Separate persistent config struct from LoadBalancer and make fields private
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-06 11:45:34 -08:00
Brad Davidson
f2f57b4a4b Remove unused code from etcdproxy
None of these fields or functions are used in k3s or rke2

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-06 11:45:34 -08:00
Brad Davidson
a39e191906 Add tests for ETCD.Test()
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-10-30 12:18:48 -07:00
Brad Davidson
095e34d816 Fix issues with defragment and alarm clear on etcd startup
* Use clientv3.NewCtxClient instead of New to avoid automatic retry of all RPCs
* Only timeout status requests; allow defrag and alarm clear requests to run to completion.
* Only clear alarms on the local cluster member, not ALL cluster members

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-10-30 12:18:48 -07:00
Brad Davidson
0826ebc142 Fix race condition when multiple nodes reconcile S3 snapshots
Don't delete s3 etcdsnapshotfiles if they are missing from s3 but less than a minute old, its possible the other node just finished uploading it and the object key has not yet become visible.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-10-07 11:11:58 -07:00
Brad Davidson
0942e6a0c5 Fix sqlite endpoint when migrating from sqlite to etcd
Support for 'sqlite' as the endpoint was removed in
https://github.com/k3s-io/kine/pull/320 and the constant removed in
https://github.com/k3s-io/kine/pull/325

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-10-03 10:54:03 -07:00
Will
e4f3cc7b54 remove deprecated use of wait functions
Signed-off-by: Will <will7989@hotmail.com>
2024-07-29 16:23:17 -07:00
Brad Davidson
c2216a62ad Use pagination when retrieving etcd snapshot list
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-24 12:44:29 -07:00
Brad Davidson
c36db53e54 Add etcd s3 config secret implementation
* Move snapshot structs and functions into pkg/etcd/snapshot
* Move s3 client code and functions into pkg/etcd/s3
* Refactor pkg/etcd to track snapshot and s3 moves
* Add support for reading s3 client config from secret
* Add minio client cache, since S3 client configuration can now be
  changed at runtime by modifying the secret, and don't want to have to
  create a new minio client every time we read config.
* Add tests for pkg/etcd/s3

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-10 13:13:55 -07:00
Brad Davidson
aa4794b372 Replace 1-weight semaphore on snapshots with simple mutex
Fixes an issue where the semaphore wasn't permanently initialized
until a scheduled snapshot was taken, allowing multiple on-demand
snapshots to be taken until the first scheduled snapshot was triggered.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-19 09:47:58 -07:00
Vitor Savian
d9b8ba8d71
Add snapshot retention etcd-s3-folder fix
* Add snapshot retention folder fix

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Add snapshot retention E2E test

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

---------

Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-06-06 17:31:01 -03:00
Brad Davidson
307f07bd61 Fix issue caused by sole server marked as failed under load
If health checks are failing for all servers, make a second pass through the server list with health-checks ignored before returning failure

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-30 11:47:23 -07:00
Brad Davidson
ed23a2bb48 Fix netpol crash when node remains tained unintialized
It is concievable that users might take more than 60 seconds to deploy their own cloud-provider. Instead of exiting, we should wait forever, but with more logging to indicate what's being waited on.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 23:34:44 -07:00
Brad Davidson
f8e0648304 Convert remaining http handlers over to use util.SendError
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 16:24:57 -07:00
Brad Davidson
3d14092f76 Fix issue with k3s-etcd informers not starting
Start shared informer caches when k3s-etcd controller wins leader election. Previously, these were only started when the main k3s apiserver controller won an election. If the leaders ended up going to different nodes, some informers wouldn't be started

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 15:48:15 -07:00
Hussein Galal
144f5ad333
Kubernetes V1.30.0-k3s1 (#10063)
* kubernetes 1.30.0-k3s1

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Update go version to v1.22.2

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update dynamiclistener and helm-controller

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update go in go.mod to 1.22.2

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update go in Dockerfiles

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update cri-dockerd

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Add proctitle package with linux and windows constraints

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod tidy

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fixing setproctitle function

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update dynamiclistener to v0.6.0-rc1

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

---------

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2024-05-06 19:42:27 +03:00
Brad Davidson
94e29e2ef5 Make /db/info available anonymously from localhost
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-22 19:34:43 -07:00
Brad Davidson
5b431ca531 Fix on-demand snapshots not honoring folder
Also fix etcd s3 tests to actually check that the files are saved to s3 🙃

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-19 23:26:51 -07:00
Brad Davidson
7d9abc9f07 Improve etcd load-balancer startup behavior
Prefer the address of the etcd member being joined, and seed the full address list immediately on startup.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-09 15:36:33 -07:00
Brad Davidson
fe465cc832 Move etcd snapshot management CLI to request/response
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-09 15:21:26 -07:00
Derek Nola
14f54d0b26
Transition from deprecated pointer library to ptr (#9801)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-03-28 10:07:02 -07:00
Vitor Savian
5d69d6e782 Add tls for kine
Signed-off-by: Vitor Savian <vitor.savian@suse.com>

Bump kine

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

Add integration tests for kine with tls

Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-03-28 11:12:07 -03:00
Brad Davidson
c51d7bfbd1 Add health-check support to loadbalancer
* Adds support for health-checking loadbalancer servers. If a
  health-check fails when dialing, all existing connections to the
  server will be closed.
* Wires up a remotedialer tunnel connectivity check as the health check
  for supervisor/apiserver connections.
* Wires up a simple ping request to the supervisor port as the health
  check for etcd connections.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-27 16:50:27 -07:00
Brad Davidson
edb0440017 Fix etcd snapshot reconcile for agentless nodes
Disable cleanup of orphaned snapshots and patching of node annotations if running agentless

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-27 16:44:36 -07:00
Brad Davidson
d7cdbb7d4d Send error response if member list cannot be retrieved
Prevents joining nodes from being stuck with bad initial member list if there is a transient failure, or if they try to join themselves

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-26 15:17:15 -07:00
Brad Davidson
3576ed4327 Clean up snapshotDir create/exists logic
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-04 12:09:29 -08:00
Brad Davidson
82432a2df7 Fix issue with etcd node name missing hostname
* Set ServerNodeName in snapshot CLI setup
* Raise errer if ServerNodeName ends up empty some other way
* Fix status controller to use etcd node name annotation instead of prefix checking

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-01 13:52:53 -08:00
Derek Nola
fae41a8b2a Rename AgentReady to ContainerRuntimeReady for better clarity
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-02-21 12:21:19 -08:00