Addresses flakes in etcd CI due to the port still being in TIME_WAIT after the server is shut down between tests
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Increase etcd shutdown delay to avoid "bind: address already in use" errors seen in CI. Also uses test TmpDir to ensure dir is cleaned up between tests.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Generate the mock executor with mockgen and convert existing uses of the mock executor to set it up properly.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This eliminates the final channel that was being passed around in an internal struct. The ETCD management code passes in a func that can be polled until etcd is ready; the executor is responsible for polling this after etcd is started and closing the etcd ready channel at the correct time.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Move the container runtime ready channel into the executor interface, instead of passing it awkwardly between server and agent config structs
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
We are not making use of the stack traces that these functions capture, so we should avoid using them as unnecessary overhead.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Also adds a CLI flag and fields for session token, which must be passed
alongside the access key and secret when using temporary credentials.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Update libraries and codegen for k8s 1.32
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Fixes for 1.32
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Disable tests with down-rev agents
These are broken by AuthorizeNodeWithSelectors being on by default. All
agents must be upgraded to v1.32 or newer to work properly, until we
backport RBAC changes to older branches.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
---------
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Co-authored-by: Brad Davidson <brad.davidson@rancher.com>
* Use clientv3.NewCtxClient instead of New to avoid automatic retry of all RPCs
* Only timeout status requests; allow defrag and alarm clear requests to run to completion.
* Only clear alarms on the local cluster member, not ALL cluster members
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Don't delete s3 etcdsnapshotfiles if they are missing from s3 but less than a minute old, its possible the other node just finished uploading it and the object key has not yet become visible.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Move snapshot structs and functions into pkg/etcd/snapshot
* Move s3 client code and functions into pkg/etcd/s3
* Refactor pkg/etcd to track snapshot and s3 moves
* Add support for reading s3 client config from secret
* Add minio client cache, since S3 client configuration can now be
changed at runtime by modifying the secret, and don't want to have to
create a new minio client every time we read config.
* Add tests for pkg/etcd/s3
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Fixes an issue where the semaphore wasn't permanently initialized
until a scheduled snapshot was taken, allowing multiple on-demand
snapshots to be taken until the first scheduled snapshot was triggered.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
If health checks are failing for all servers, make a second pass through the server list with health-checks ignored before returning failure
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
It is concievable that users might take more than 60 seconds to deploy their own cloud-provider. Instead of exiting, we should wait forever, but with more logging to indicate what's being waited on.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Start shared informer caches when k3s-etcd controller wins leader election. Previously, these were only started when the main k3s apiserver controller won an election. If the leaders ended up going to different nodes, some informers wouldn't be started
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* kubernetes 1.30.0-k3s1
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Update go version to v1.22.2
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update dynamiclistener and helm-controller
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update go in go.mod to 1.22.2
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update go in Dockerfiles
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update cri-dockerd
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Add proctitle package with linux and windows constraints
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* go mod tidy
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Fixing setproctitle function
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update dynamiclistener to v0.6.0-rc1
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
---------
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Prefer the address of the etcd member being joined, and seed the full address list immediately on startup.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Adds support for health-checking loadbalancer servers. If a
health-check fails when dialing, all existing connections to the
server will be closed.
* Wires up a remotedialer tunnel connectivity check as the health check
for supervisor/apiserver connections.
* Wires up a simple ping request to the supervisor port as the health
check for etcd connections.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Prevents joining nodes from being stuck with bad initial member list if there is a transient failure, or if they try to join themselves
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Set ServerNodeName in snapshot CLI setup
* Raise errer if ServerNodeName ends up empty some other way
* Fix status controller to use etcd node name annotation instead of prefix checking
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>