Refactored deployDevicePlugin, added undeployDevicePlugin
- Add skipCleanup parameter to control cleanup behavior
- Wait for device plugin resources to become allocatable on deploy
- Add undeployDevicePlugin function to properly cleanup and wait for
resources to be removed from node allocatable capacity
- Update tests to use refactored functions and remove duplicate code
These changes reduce test flakiness by explicitly waiting for expected
node states before proceeding with test operations.
- Moved sample device plugin constants and helper code to the
test/e2e/node/framework, so that both deviceplugin and DRA tests can
use it without creating e2e -> e2e_node dependency.
- Moved SampleDevsAmount constant from the
test/e2e_node/device_plugin_test.go
kubelet: defer the configurations flags (and the related fallback behavior) deprecation removal timeline from 1.36 to 1.37 to align with containerd v1.7 support
Passing a context to StartWithContext enables context-aware reflector
logging. This is the main remaining source of log spam (output to stderr
instead of per-test logger) in controller unit tests.
WaitForCacheSynceWithContext takes advantage of the new cache.WaitFor +
NamedHasSynced functionality to finish "immediately" (= no virtual time
passed) in a synctest bubble. While at it, the return type gets improved so
that a failure is easier to handle.
The main advantage is that waiting on channels creates a causal relationship
between goroutines which is visible to synctest. When a controller in a
synctest bubble does a WaitFor in a test's background goroutine for the
controller, the test can use synctest.Wait to wait for completion of cache
sync, without requiring any test specific "has controller synced" API. Without
this, the test had to poll or otherwise wait for the controller.
The polling in WaitForCacheSync moved the virtual clock forward by a random
amount, depending on how often it had to check in wait.Poll. Now tests can be
written such that all events during a test happen at a predictable time. This
will be demonstrated in a separate commit for the
pkg/controller/devicetainteviction unit test.
The benefit for normal production is immediate continuation when the last
informer is synced (not really a problem, but still...) and more important,
nicer logging thanks to the names associated with the thing that is being
waited for. The caller decides whether logging is enabled or disabled and
describes what is being waited for (typically informer caches, but maybe also
event handlers or even something else entirely as long as it implements the
DoneChecker interface).
Before:
Waiting for caches to sync
Caches are synced
After:
Waiting for="cache and event handler sync"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1.Pod"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1.ResourceClaim"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1.ResourceSlice"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1.DeviceClass"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1alpha3.DeviceTaintRule"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1.ResourceClaim + event handler k8s.io/kubernetes/pkg/controller/devicetainteviction.(*Controller).Run"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1.Pod + event handler k8s.io/kubernetes/pkg/controller/devicetainteviction.(*Controller).Run"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1alpha3.DeviceTaintRule + event handler k8s.io/kubernetes/pkg/controller/devicetainteviction.(*Controller).Run"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1.ResourceSlice + event handler k8s.io/kubernetes/pkg/controller/devicetainteviction.(*Controller).Run"
The "SharedIndexInformer *v1.Pod" is also how this appears in metrics.
This improves logging and enables more informative waiting for cache sync in a
following commit. It addresses one klog.TODO in the Reflector.
The RealFIFOOptions and InformerOptions structs get extended the same way as
DeltaFIFOOptions before: a logger may be set, but it's not required. This is
not an API break.
That the name has to be passed separately is a bit annoying at first glance
because it could also be set directly on the logger through WithName, but
keeping it separate is better:
- name can be set without providing a logger
- name can be defaulted
- less code in the caller when passing through a logger and adding
the name only in the field
- last but not least, extracting the name is not supported in a portable
manner by logr
All in-tree references in production code get updated.
While at it, logging in the fifos gets updated to follow best practices: if
some code encounters an abnormal situation and then continues, it should use
utilruntime.HandleErrorWithLogger instead of normal error logging.
Existing "logger" fields get moved to the top because that is a more common
place for such a read-only field.
The test need to consider the time for the Delete operation to populate
the ipallocator informer, otherwise, it can happen the allocator fails
with a range full failing the test.
Co-authored-by: hiirrxnn <hiren2004sharma@gmail.com>