Move the `ipv4` and `ipv6` constants to their own constant
declaration. This ensures that the `iota` expression for the `ipv4`
constant evaluates to 0, not some arbitrary value. (`iota` evaluates
to N for the Nth constant in the constant declaration; see
<https://go.dev/ref/spec#Iota>.) This is also more idiomatic, which
improves readability.
Also switch from incremental integers to bit flags, and use bitwise
operators for checking. This is more idiomatic (the integer is
treated like a set of booleans), it avoids some code duplication, and
it is necessary to avoid ambiguity. Consider the following:
const (
ipv4 = iota
ipv6
)
In the above, `ipv4` would have the value 0 and `ipv6` would have the
value 1. This would make it impossible to distinguish an IPv6-only
stack from a dual-stack configuration because `ipv6` would equal
`ipv4 + ipv6`. With bit flags this problem doesn't exist.
And put the integer holding the bit flags in a custom type with
convenience methods to improve readability.
Signed-off-by: Richard Hansen <rhansen@rhansen.org>
Normally K3s will import all tarballs in the image dir on startup, and
re-import any tarballs that change while it is running.
This change allows users to opt into only importing tarballs that have
changed since they were last imported, even across restarts.
This behavior is opted into by touching a `.cache.json` file in the
images dir. This file is used to track the size and mtime of the image
files when they are imported.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Panic gets rescued by the http server, and was only visible when running in debug mode, but should be handled properly.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Avoids infinite recursion when the chain includes an agentBootstrapper with a server address that points back at this node (via join address loop or external LB)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Adds maximum in-flight request limits to agent join and p2p peer info
request request handlers.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
libp2p may make a large number of bootstrap calls during startup; serve nodes from cache to avoid excessive CPU usage.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
I assume this was a mistake when copying and the kubelet key should have been used here.
This bug was introduced in #11471.
Signed-off-by: Aaron Dewes <aaron@nirvati.org>
* chore: Bump Klipper Helm and Helm Controller version
Made with ❤️️ by updatecli
* chore: Bump Klipper Helm and Helm Controller version
Made with ❤️️ by updatecli
* Fix build
Signed-off-by: Derek Nola <derek.nola@suse.com>
---------
Signed-off-by: Derek Nola <derek.nola@suse.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Derek Nola <derek.nola@suse.com>
Addresses flakes in etcd CI due to the port still being in TIME_WAIT after the server is shut down between tests
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
- Add testlet for new provider switch
- Handle migration between providers
- Add exception for criticalcontrolargs
Signed-off-by: Derek Nola <derek.nola@suse.com>
Increase etcd shutdown delay to avoid "bind: address already in use" errors seen in CI. Also uses test TmpDir to ensure dir is cleaned up between tests.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Move arg-parsing helper functions into util, and use them to see if the user has set an authorization-config flag - and do not set authorization-mode if so.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Generate the mock executor with mockgen and convert existing uses of the mock executor to set it up properly.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This eliminates the final channel that was being passed around in an internal struct. The ETCD management code passes in a func that can be polled until etcd is ready; the executor is responsible for polling this after etcd is started and closing the etcd ready channel at the correct time.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Move the container runtime ready channel into the executor interface, instead of passing it awkwardly between server and agent config structs
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Splits server startup into prepare/start phases. Server's agent is now
started after server is prepared, but before it is started. This allows
us to properly bootstrap the executor before starting server components,
and use the executor to provide a shared channel to wait on apiserver
readiness.
This allows us to replace four separate callers of WaitForAPIServerReady
with reads from a common ready channel.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Increases log verbosity but decreases polling frequency to avoid
spamming the console. It usually takes a couple seconds for the
apiserver to come up anyway.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Remove the AddOn last, after any resources it created in the cluster
have been deleted and the AddOn file is removed.
Signed-off-by: Robert Rose <robert.rose@mailbox.org>
* Bump rootlesskit tov 1.1.1, last of the v1 line
* Migrate to urfavecli v2
* Disable StringSlice seperattion
Signed-off-by: Derek Nola <derek.nola@suse.com>
We are not making use of the stack traces that these functions capture, so we should avoid using them as unnecessary overhead.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Fixes issue where CA rotation would fail on servers with join URL set due to using old data from disk on other server
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
RKE2 on Windows sets CNI bin dirs in node config even though embedded flannel is disabled (NoFlannel=true). We need to gate rendering this config on the vars being, set NOT on NoFlannel being false.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Ref: https://github.com/containerd/containerd/blob/release/2.0/docs/cri/config.md
Since this is a breaking change, add support for a new v3 template file. If no v3 template is present, fall back to checking for the legacy v2 template and render the old structure.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Netpol startup is skipped with a warning on linux if ipset support is missing, we should do the same on windows
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Also adds a CLI flag and fields for session token, which must be passed
alongside the access key and secret when using temporary credentials.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* chore: Bump Local Path Provisioner version
Made with ❤️️ by updatecli
* chore: Bump Local Path Provisioner version
Made with ❤️️ by updatecli
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Restricting deferred node password validation to only requests from the local node is not possible without breaking split-role cluster cold start. There are too many cases where node password secrets may not yet be available due to the apiserver not being up.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Bump klipper-lb to v0.4.10
Bump klipper-helm to v0.9.4
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Bump helm-controller
Signed-off-by: Derek Nola <derek.nola@suse.com>
---------
Signed-off-by: Derek Nola <derek.nola@suse.com>
The effective snapshot dir is "${data-dir}/server/db/snapshots". The
server segment is missing in the CLI-reported default path, potentially
misleading the user about the actual default snapshot destination.
Signed-off-by: Maja Bojarska <majabojarska98@gmail.com>
Only wait for k3s-controller RBAC when AuthorizeNodeWithSelectors blocks kubelet from listing nodes
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Make this field an interface instead of pointer to allow mocking. Not sure why wrangler has a type that returns an interface instead of just making it an interface itself. Wrangler in general is hard to mock for testing.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Clients now generate keys client-side and send CSRs. If the server is down-level and sends a cert+key instead of just responding with a cert signed with the client's public key, we use the key from the server instead.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
The servers package, and router.go in particular, had become quite
large. Address this by moving some things out to separate packages:
* http request handlers all move to pkg/server/handlers.
* node password bootstrap auth handler goes into pkg/nodepassword with
the other nodepassword code.
While we're at it, also be more consistent about calling variables that
hold a config.Control struct or reference `control` instead of `config` or `server`.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Add "k3s certificate check" clause for better test coverage
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Add table support to cert check
Signed-off-by: Derek Nola <derek.nola@suse.com>
---------
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Update libraries and codegen for k8s 1.32
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Fixes for 1.32
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Disable tests with down-rev agents
These are broken by AuthorizeNodeWithSelectors being on by default. All
agents must be upgraded to v1.32 or newer to work properly, until we
backport RBAC changes to older branches.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
---------
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Co-authored-by: Brad Davidson <brad.davidson@rancher.com>
Add flags supervisor and apiserver ports and bind address so that we can add an e2e to cover supervisor and apiserver on separate ports, as used by rke2
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Add missing default OS for split server test
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Launch go routine and return for k3s secrets-encrypt reencrypt
Signed-off-by: Derek Nola <derek.nola@suse.com>
---------
Signed-off-by: Derek Nola <derek.nola@suse.com>
The loadbalancer should only fail over to the default server if all other server have failed, and it should force fail-back to a preferred server as soon as one passes health checks.
The loadbalancer tests have been improved to ensure that this occurs.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This fixes: 'error: no Auth Provider found for name "oidc"' when trying to run any subcommands in kubectl that require a valid server login.
Signed-off-by: Ludo Stellingwerff <ludo.stellingwerff@gmail.com>
External CLI actions cannot short-circuit on --help or --version, so we
cannot skip loading the config file if these flags are present when
running these wrapped commands. The behavior of just returning the
override flag name instead of the requested flag value was breaking
data-dir lookup when running wrapped commands.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Add new flag that is passed through to the device_ownership_from_security_context parameter in the containerd CRI config. This is not possible to change without providing a complete custom containerd.toml template so we should add a flag for it.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Use clientv3.NewCtxClient instead of New to avoid automatic retry of all RPCs
* Only timeout status requests; allow defrag and alarm clear requests to run to completion.
* Only clear alarms on the local cluster member, not ALL cluster members
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Fix vagrant/libvirt composite action for ubuntu-24.04
* Don't ignore changes to internal actions
* Fix unit tests for ubuntu 24.04, new lsof version
* Pin os version for unit and E2E workflows
Signed-off-by: Derek Nola <derek.nola@suse.com>
Don't delete s3 etcdsnapshotfiles if they are missing from s3 but less than a minute old, its possible the other node just finished uploading it and the object key has not yet become visible.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Used https://github.com/coredns/corefile-migration to
migrate the corefile. There are no changes for the
default file from 1.10.1 to 1.11.3.
Notable plugin changes include the k8s_external with fallthrough option
and rewrite with cname_target option.
These changes are not part of the default config that ships
with k3s. Customers using these two plugins can start using the new options
Metrics does not have any new features other than build tooling updates.
Requires https://github.com/rancher/image-mirror/pull/704
Signed-off-by: Harsimran Singh Maan <maan.harry@gmail.com>
Also silences warnings about bootstrap fields that are not intended to be handled by CA rotation
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
FindString would silently skip parsing dropins if the main config file
didn't exist. If a custom config file path was passed it would raise an
error, but if we were parsing the default config file and it didn't
exist it would just silently fail to load the dropins.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Allow pprof to run on server with `--disable-agent`
* Allow supervisor metrics to run on server with `--disable-agent`
Signed-off-by: Derek Nola <derek.nola@suse.com>
Fixes an issue where running etcd-snapshot commands on a node that has a server address set in the config will manage snapshots on that server, instead of on the local node as intended.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This should give us more detail on how long dials take before failing, so that we can perhaps better tune the retry loop in the future.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
I should have caught `[]string{cfg.NodeIP}[0]` and `[]string{envInfo.NodeIP.String()}[0]` in code review...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
We shouldn't be replacing the configured server address on agents. Doing
so breaks the agent's ability to fall back to the fixed registration
endpoint when all servers are down, since we replaced it with the first
discovered apiserver address. The fixed registration endpoint will be
restored as default when the service is restarted, but this is not the
correct behavior. This should have only been done on etcd-only nodes
that start up using their local supervisor, but need to switch to a
control-plane node as soon as one is available.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Move snapshot structs and functions into pkg/etcd/snapshot
* Move s3 client code and functions into pkg/etcd/s3
* Refactor pkg/etcd to track snapshot and s3 moves
* Add support for reading s3 client config from secret
* Add minio client cache, since S3 client configuration can now be
changed at runtime by modifying the secret, and don't want to have to
create a new minio client every time we read config.
* Add tests for pkg/etcd/s3
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* chore: Bump Local Path Provisioner version
Made with ❤️️ by updatecli
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Fixes an issue where the semaphore wasn't permanently initialized
until a scheduled snapshot was taken, allowing multiple on-demand
snapshots to be taken until the first scheduled snapshot was triggered.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
If proxy.SetAPIServerPort was called multiple times, all calls after the
first one would cause the apiserver address to be set to the default
server address, bypassing the local load-balancer. This was most likely
to occur on RKE2, where the supervisor may be up for a period of time
before it is ready to manage node password secrets, causing the agent
to retry.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Add write-kubeconfig-group flag to server
* update kubectl unable to read config message for kubeconfig mode/group
Signed-off-by: Katherine Pata <me@kitty.sh>
If health checks are failing for all servers, make a second pass through the server list with health-checks ignored before returning failure
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
It is concievable that users might take more than 60 seconds to deploy their own cloud-provider. Instead of exiting, we should wait forever, but with more logging to indicate what's being waited on.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Refactor agent supervisor listener startup and authn/authz to use upstream
auth delegators to perform for SubjectAccessReview for access to
metrics.
* Convert spegel and pprof handlers over to new structure.
* Promote bind-address to agent flag to allow setting supervisor bind
address for both agent and server.
* Promote enable-pprof to agent flag to allow profiling agents. Access
to the pprof endpoint now requires client cert auth, similar to the
spegel registry api endpoint.
* Add prometheus metrics handler.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Start shared informer caches when k3s-etcd controller wins leader election. Previously, these were only started when the main k3s apiserver controller won an election. If the leaders ended up going to different nodes, some informers wouldn't be started
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* kubernetes 1.30.0-k3s1
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Update go version to v1.22.2
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update dynamiclistener and helm-controller
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update go in go.mod to 1.22.2
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update go in Dockerfiles
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update cri-dockerd
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Add proctitle package with linux and windows constraints
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* go mod tidy
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Fixing setproctitle function
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update dynamiclistener to v0.6.0-rc1
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
---------
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>