Add unit tests for handwritten and declarative validation, controller
logic, metrics, table printer output, controller-manager registration,
etcd storage round-trip, and an integration test for the full RPSR
lifecycle. Also add an e2e test exercising the DRA test driver with
RPSR and the example manifest.
Implement the RPSR controller that watches ResourcePoolStatusRequest
objects and aggregates pool status from DRA drivers. Add the API server
registry (strategy, storage), handwritten validation, RBAC bootstrap
policy for the controller, kube-controller-manager wiring, table
printer columns, and storage factory registration.
This is the last step in the process, simply linking ReleaseOnCancel to
ControllerManagerReleaseLeaderElectionLockOnExit feature gate.
The original logic when the leader election release on exit feature gate
is disabled is to not catch signals and exit immediately when the LE
lock is lost. This is being put back into place so that the new behavior
can be tested without affecting the former approach at all.
The fields become beta, enabled by default. DeviceTaintRule gets
added to the v1beta2 API, but support for it must remain off by default
because that API group is also off by default.
The v1beta1 API is left unchanged. No-one should be using it
anymore (deprecated in 1.33, could be removed now if it wasn't for
reading old objects and version emulation).
To achieve consistent validation, declarative validation must be enabled also
for v1alpha3 (was already enabled for other versions). Otherwise,
TestVersionedValidationByFuzzing fails:
--- FAIL: TestVersionedValidationByFuzzing (0.09s)
--- FAIL: TestVersionedValidationByFuzzing/resource.k8s.io/v1beta2,_Kind=DeviceTaintRule (0.00s)
validation_test.go:109: different error count (0 vs. 1)
resource.k8s.io/v1alpha3: <no errors>
resource.k8s.io/v1beta2: "spec.taint.effect: Unsupported value: \"幤HxÒQP¹¬永唂ȳ垞ş]嘨鶊\": supported values: \"NoExecute\", \"NoSchedule\", \"None\""
...
FSWatcher.Run() spawned a goroutine with no exit mechanism, causing a
goroutine leak. Add a ctx context.Context parameter to Run() so the
goroutine can exit cleanly when the context is canceled, and
defer-close the underlying fsnotify watcher on exit.
For kube-proxy, the existing ctx from runLoop() is passed directly.
For the flexvolume prober, ctx is stored in flexVolumeProber at
construction time via GetDynamicPluginProber(), representing the
component lifetime (kubelet/controller-manager), which is the
appropriate scope for this long-running watcher.
* DRA resource claim controller: configurable number of workers
It might never be necessary to change the default, but it is hard to be sure.
It's better to have the option, just in case.
* generate files
* resourceclaimcontroller: normalize validation error message
* Update cmd/kube-controller-manager/app/options/resourceclaimcontroller.go
Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
---------
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
kube-controller-manager and kube-scheduler do not use the
configured loopback clients. Drop them.
Fix up scheduler test server to not depend on
the loopback client.
This change refactors the cloud-specific versions of the node lifecycle
and node IPAM controllers to use a context.Context for cancellation and
contextual logging, replacing the legacy stopCh pattern.
This is a follow-up to PR #133985, where these controllers were
separated out due to their use in the legacy Cloud Controller Manager
(CCM).
It is a known issue that the CCM's startup logic does not pass the
controller name via the context. This change proceeds with the
refactoring to unify the cancellation logic across controllers, while
acknowledging that contextual logs will be less detailed when these
controllers are run in the CCM.
Signed-off-by: Aditi Gupta <aditigpta@google.com>