Commit graph

7174 commits

Author SHA1 Message Date
Kubernetes Prow Robot
2bbb175707
Merge pull request #137461 from ahmedharabi/fix/statefulset-error-wrapping
statefulset: wrap errors with %w in StatefulPodControl
2026-03-07 00:08:25 +05:30
Jordan Liggitt
45900a1deb
Fix vet error 2026-03-05 18:11:02 -05:00
ahmedharabi
a0dee17c1d statefulset: wrap errors with %w in StatefulPodControl
Signed-off-by: ahmedharabi <harabiahmed88@gmail.com>
2026-03-05 23:02:16 +01:00
Kubernetes Prow Robot
c6f70e3a38
Merge pull request #136399 from tico88612/feat/storage-metric-beta
Rename metric `volume_operation_total_errors` to `volume_operation_errors_total`
2026-03-06 00:46:18 +05:30
Omer Aplatony
3799fc9942
Add unit tests for HPA metrics (#136670)
* Add unit tests for HPA metrics

Signed-off-by: Omer Aplatony <omerap12@gmail.com>

* removed mock monitor

Signed-off-by: Omer Aplatony <omerap12@gmail.com>

* fmt

Signed-off-by: Omer Aplatony <omerap12@gmail.com>

* spelling

Signed-off-by: Omer Aplatony <omerap12@gmail.com>

* lint

Signed-off-by: Omer Aplatony <omerap12@gmail.com>

* lint

Signed-off-by: Omer Aplatony <omerap12@gmail.com>

---------

Signed-off-by: Omer Aplatony <omerap12@gmail.com>
2026-03-05 19:10:26 +05:30
Kubernetes Prow Robot
8bd1505fc0
Merge pull request #137108 from pohly/logtools-update
golangci-lint: bump to logtools v0.10.1
2026-03-05 10:14:16 +05:30
Kubernetes Prow Robot
8275484dcf
Merge pull request #137297 from atombrella/feature/pkg_forvar_modernize
Remove redundant variable re-assignment in for-loops under pkg
2026-03-05 00:28:20 +05:30
xigang
9d10b1f799 refactor: remove unused desiredStateOfWorld parameter from DetermineVolumeAction
Signed-off-by: xigang <wangxigang2014@gmail.com>
2026-03-04 22:01:43 +08:00
Kubernetes Prow Robot
9d7dda7186
Merge pull request #137245 from atombrella/feature/slices_contains_pkg_controller
Update `pkg/controller` to use slices.Contains
2026-03-04 18:04:20 +05:30
Patrick Ohly
b895ce734f golangci-lint: bump to logtools v0.10.1
This fixes a bug that caused log calls involving `klog.Logger` to not be
checked.

As a result we have to fix some code that is now considered faulty:

    ERROR: pkg/controller/serviceaccount/tokens_controller.go:382:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (e *TokensController) generateTokenIfNeeded(ctx context.Context, logger klog.Logger, serviceAccount *v1.ServiceAccount, cachedSecret *v1.Secret) ( /* retry */ bool, error) {
    ERROR: ^
    ERROR: pkg/controller/storageversionmigrator/storageversionmigrator.go:299:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (svmc *SVMController) runMigration(ctx context.Context, logger klog.Logger, gvr schema.GroupVersionResource, resourceMonitor *garbagecollector.Monitor, toBeProcessedSVM *svmv1beta1.StorageVersionMigration, listResourceVersion string) (err error, failed bool) {
    ERROR: ^
    ERROR: pkg/proxy/node.go:121:3: logging function "Error" should not use format specifier "%q" (logcheck)
    ERROR: 		klog.FromContext(ctx).Error(nil, "Timed out waiting for node %q to exist", nodeName)
    ERROR: 		^
    ERROR: pkg/proxy/node.go:123:3: logging function "Error" should not use format specifier "%q" (logcheck)
    ERROR: 		klog.FromContext(ctx).Error(nil, "Timed out waiting for node %q to be assigned IPs", nodeName)
    ERROR: 		^
    ERROR: pkg/scheduler/backend/queue/scheduling_queue.go:610:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (p *PriorityQueue) runPreEnqueuePlugin(ctx context.Context, logger klog.Logger, pl fwk.PreEnqueuePlugin, pInfo *framework.QueuedPodInfo, shouldRecordMetric bool) *fwk.Status {
    ERROR: ^
    ERROR: pkg/scheduler/framework/plugins/dynamicresources/extendeddynamicresources.go:286:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (pl *DynamicResources) deleteClaim(ctx context.Context, claim *resourceapi.ResourceClaim, logger klog.Logger) error {
    ERROR: ^
    ERROR: pkg/scheduler/framework/plugins/dynamicresources/extendeddynamicresources.go:499:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (pl *DynamicResources) waitForExtendedClaimInAssumeCache(
    ERROR: ^
    ERROR: pkg/scheduler/framework/plugins/dynamicresources/extendeddynamicresources.go:528:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (pl *DynamicResources) createExtendedResourceClaimInAPI(
    ERROR: ^
    ERROR: pkg/scheduler/framework/plugins/dynamicresources/extendeddynamicresources.go:592:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (pl *DynamicResources) unreserveExtendedResourceClaim(ctx context.Context, logger klog.Logger, pod *v1.Pod, state *stateData) {
    ERROR: ^
    ERROR: pkg/scheduler/framework/runtime/batch.go:171:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (b *OpportunisticBatch) batchStateCompatible(ctx context.Context, logger klog.Logger, pod *v1.Pod, signature fwk.PodSignature, cycleCount int64, state fwk.CycleState, nodeInfos fwk.NodeInfoLister) bool {
    ERROR: ^
    ERROR: staging/src/k8s.io/component-base/featuregate/feature_gate.go:890:4: Additional arguments to Info should always be Key Value pairs. Please check if there is any key or value missing. (logcheck)
    ERROR: 			logger.Info("Warning: SetEmulationVersionAndMinCompatibilityVersion will change already queried feature", "featureGate", feature, "oldValue", oldVal, newVal)
    ERROR: 			^
    ERROR: test/images/sample-device-plugin/sampledeviceplugin.go:108:2: logging function "Info" should not use format specifier "%s" (logcheck)
    ERROR: 	logger.Info("pluginSocksDir: %s", pluginSocksDir)
    ERROR: 	^
    ERROR: test/images/sample-device-plugin/sampledeviceplugin.go:123:2: logging function "Info" should not use format specifier "%s" (logcheck)
    ERROR: 	logger.Info("CDI_ENABLED: %s", cdiEnabled)
    ERROR: 	^

While waiting for this to merge, another call was added which also doesn't
follow conventions:

    ERROR: pkg/kubelet/kubelet.go:2454:1: A function should accept either a context or a logger, but not both. Having both makes calling the function harder because it must be defined whether the context must contain the logger and callers have to follow that. (logcheck)
    ERROR: func (kl *Kubelet) deletePod(ctx context.Context, logger klog.Logger, pod *v1.Pod) error {
    ERROR: ^

Contextual logging has been beta and enabled by default for several releases
now. It's mostly just a matter of wrapping up and declaring it GA. Therefore
the calls which directly call WithName or WithValues (always have an effect)
are left as-is instead of converting them to use the klog wrappers (support
disabling the effect). To allow that, the linter gets reconfigured to not
complain about this anymore, anywhere.

The calls which would have to be fixed otherwise are:

    ERROR: pkg/kubelet/cm/dra/claiminfo.go:170:11: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger = logger.WithName("dra-claiminfo")
    ERROR: 	         ^
    ERROR: pkg/kubelet/cm/dra/healthinfo.go:45:11: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger = logger.WithName("dra-healthinfo")
    ERROR: 	         ^
    ERROR: pkg/kubelet/cm/dra/healthinfo.go:89:11: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger = logger.WithName("dra-healthinfo")
    ERROR: 	         ^
    ERROR: pkg/kubelet/cm/dra/healthinfo.go:157:11: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger = logger.WithName("dra-healthinfo")
    ERROR: 	         ^
    ERROR: pkg/kubelet/cm/dra/manager.go:175:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-manager")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/manager.go:239:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-manager")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/manager.go:593:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-manager")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/manager.go:781:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(context.Background()).WithName("dra-manager")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/manager.go:898:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-manager")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/manager_test.go:1638:15: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 				logger := klog.FromContext(streamCtx).WithName(st.Name())
    ERROR: 				          ^
    ERROR: pkg/kubelet/cm/dra/plugin/dra_plugin.go:77:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-plugin")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/plugin/dra_plugin.go:108:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-plugin")
    ERROR: 	          ^
    ERROR: pkg/kubelet/cm/dra/plugin/dra_plugin.go:161:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	logger := klog.FromContext(ctx).WithName("dra-plugin")
    ERROR: 	          ^
    ERROR: staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/tracker.go:695:14: function "WithValues" should be called through klogr.LoggerWithValues (logcheck)
    ERROR: 			logger := logger.WithValues("device", deviceID)
    ERROR: 			          ^
    ERROR: test/integration/apiserver/watchcache_test.go:42:54: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	etcd0URL, stopEtcd0, err := framework.RunCustomEtcd(klog.FromContext(ctx).WithName("etcd0"), "etcd_watchcache0", etcdArgs)
    ERROR: 	                                                    ^
    ERROR: test/integration/apiserver/watchcache_test.go:47:54: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 	etcd1URL, stopEtcd1, err := framework.RunCustomEtcd(klog.FromContext(ctx).WithName("etcd1"), "etcd_watchcache1", etcdArgs)
    ERROR: 	                                                    ^
    ERROR: test/integration/scheduler_perf/scheduler_perf.go:1149:12: function "WithName" should be called through klogr.LoggerWithName (logcheck)
    ERROR: 		logger = logger.WithName(tCtx.Name())
    ERROR: 		         ^
2026-03-04 12:08:18 +01:00
Kubernetes Prow Robot
5941fed3d6
Merge pull request #136912 from dfajmon/selinux-ga
Promote SELinuxChangePolicy & SELinuxMountReadWriteOncePod to GA
2026-03-03 22:07:29 +05:30
Kubernetes Prow Robot
11c10dc5a0
Merge pull request #136939 from pohly/dra-device-taints-unit-test-improvements
DRA device taints: update unit tests
2026-03-03 02:48:54 +05:30
Mads Jensen
f11bb48738 Remove redundant re-assignment in for-loops under pkg
This the forvar rule from modernize. The semantics of the for-loop
changed from Go 1.22 to make this pattern obsolete.
2026-03-02 08:47:43 +01:00
ChengHao Yang
5c88906dca
Rename volume_operation_total_errors to volume_operation_errors_total
Raname this because facing lint error, counter metrics should have
"_total" suffix. Add the test `volume_operation_errors_total`
Marked `volume_operation_total_errors` as deprecated

Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
2026-02-28 20:08:07 +08:00
Kubernetes Prow Robot
330950ca52
Merge pull request #137254 from michaelasp/statefulConsistency
Add the ability for the statefulset controller to read its own writes
2026-02-28 01:39:30 +05:30
Michael Aspinwall
c8e8bd5085 Add the ability for the statefulset controller to read its own writes 2026-02-27 18:21:30 +00:00
Daniel Fajmon
b0919d81a0 Promote SELinuxChangePolicy & SELinuxMountReadWriteOncePod to GA 2026-02-27 14:58:14 +01:00
Patrick Ohly
29e92367db DRA device taints: avoid unnecessary Pod lookup
When rapidly processing informer events it can happen that a pod gets scheduled
twice (seen only in the TestEviction/update unit test):

- Claim update observed, pod from informer cache with NodeName from update -> queue pod for eviction.
- Pod update observed, claim from informer cache -> queue pod again.

The effect is one additional Get call to the apiserver. We can avoid it by
maintaining an LRU cache with the UIDs of the pods which we have evicted and
thus don't need to do anything for.
2026-02-27 14:38:30 +01:00
Patrick Ohly
017a53a1a9 DRA device taints: simplify more tests with synctest
In these cases it's certain that no time needs to pass, so Wait can
replace polling with Eventually. This also means that locking is
not necessary to prevent data races.
2026-02-27 07:47:28 +01:00
Patrick Ohly
4521c34276 DRA device taints: remove usage of testify for unit test
In particular with the builtin tCtx.Assert/Expect the assertions are also short
when using gomega and often more readable (no more confusion in Equal which one
is the expected and which the actual value).
2026-02-27 07:47:28 +01:00
Patrick Ohly
fb94a99d2f DRA device taints: artificially delay pod deletion during test
We can observe the delay in the metric histogram. Because we run in a synctest
bubble, the delay is 100% predictable.

Unfortunately we cannot use the reactor mechanism of the fake client: that
delays while holding the fake's mutex. When some other goroutine (in this case,
the event recorder) calls the client, it gets blocked without being considered
durably blocked by synctest, so time does not advance and the test gets stuck.
2026-02-27 07:47:28 +01:00
Patrick Ohly
7d7b4c3dcb DRA device taint tests: remove List+Watch workaround
This was fixed in client-go itself, no workaround needed anymore.
2026-02-27 07:46:33 +01:00
Patrick Ohly
75626bcf3f DRA device taints: update unit tests
Thanks for waiting for cache sync via channels the random delays caused by
polling are gone, making the initial setup including cache sync happen
"immediately" when a test starts (= same virtual time). This makes the tests
more predictable and simplifies making further assertions about when something
happens or how long it takes.

While at it, restore previous performance by setting feature gates once and
running tests in parallel again.
2026-02-27 07:46:19 +01:00
Mads Jensen
d11d54dc50 Update pkg/controller to use slices.Contains 2026-02-26 10:17:13 +01:00
Karthik Bhat
43bfd8615d Refactor NewTestContext to return Context instead of TContext 2026-02-26 11:27:26 +05:30
Kubernetes Prow Robot
7ad86d14df
Merge pull request #137243 from michaelasp/fixJobClear
Fix clearing job consistency store for all deletes
2026-02-26 01:12:23 +05:30
Michael Aspinwall
f18f0df7fe Add the ability for the replicaset controller to read its own writes 2026-02-25 17:15:53 +00:00
Michael Aspinwall
008b92e0f6 Fix clearing job consistency store for all deletes 2026-02-25 17:13:50 +00:00
Kubernetes Prow Robot
c6d1649721
Merge pull request #137226 from tchap/selinuxwarning-reverse-index
controller/selinuxwarning/cache: Add reverse index to speed up DeletePod
2026-02-25 21:16:34 +05:30
Kubernetes Prow Robot
9f65538a35
Merge pull request #137224 from tchap/conflicts-parsed
controller/selinuxwarning: Pre-parse SELinux label
2026-02-25 16:27:50 +05:30
Ondra Kupka
911a61d050 controller/selinuxwarning/cache: Add reverse index
Added podToVolumes reverse index to optimize DeletePod.
Currently we simply iterate through all the volumes and remove the pod
being deleted from there. This is inefficient and takes longer the
longer the volume list becomes.

Keeping a map pod -> volumes makes removing a pod fast. We can just jump
to the relevant volumes directly and remove the pod from there.
2026-02-25 11:38:50 +01:00
Michael Aspinwall
61d0dd30fb Add the ability for the job controller to read its own writes 2026-02-25 01:19:48 +00:00
Ondra Kupka
a34456319d controller/selinuxwarning: Pre-parse SELinux label
When calling ControllerSELinuxTranslator.Conflicts(), the SELinux label
is repeatedly split into []string to detect conflicts. This causes a huge
number of allocations when there are many comparisons.

This is now made more efficient by pre-parsing the SELinux label and
storing it in podInfo as [4]string for fast comparison when needed.
2026-02-24 18:08:36 +01:00
Kubernetes Prow Robot
8812ec563c
Merge pull request #134353 from skitt/drop-string-slice
Deprecate obsolete slice utility functions
2026-02-20 00:57:41 +05:30
Michael Aspinwall
65eb0e94c2 Daemonset Consistency
Add the ability for the daemonset controller to figure out whether it has read its own writes for pods and daemonset objects.
2026-02-19 16:53:19 +00:00
Stephen Kitt
d42d1e3d1f
Deprecate obsolete slice utility functions
... and update users to use standard library functions.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2026-02-16 10:04:33 +01:00
Kubernetes Prow Robot
d7f6f91dae
Merge pull request #135820 from pohly/dra-sharing-claim-sequentially-test
DRA: sharing claim sequentially test
2026-02-13 01:50:09 +05:30
Kubernetes Prow Robot
98dd4d8e60
Merge pull request #136812 from rpb-ant/rpb/sts-not-found
Add 404 handling for the statefulset controller pod deletion codepath
2026-02-13 00:18:00 +05:30
Kubernetes Prow Robot
5b63a8c68e
Merge pull request #136921 from dims/dump-from-utils
Move dump package from apimachinery to k8s.io/utils
2026-02-12 22:28:10 +05:30
Ryan Brewster
11c6f8c7c8
Clean up redundant IsNotFound checks in stateful_set_control
🏠 Remote-Dev: homespace
2026-02-12 14:35:10 +00:00
Davanum Srinivas
550cc8645b
Move dump package from apimachinery to k8s.io/utils
Replace all imports of k8s.io/apimachinery/pkg/util/dump with
k8s.io/utils/dump across the repo. The apimachinery dump package
now contains deprecated wrapper functions that delegate to
k8s.io/utils/dump for backwards compatibility.

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2026-02-12 07:34:19 -05:00
Patrick Ohly
bff684d951 DRA ResourceClaim controller: update logging
This provides a bit more information when the controller touches a
ResourceClaim.
2026-02-12 12:33:22 +01:00
Kubernetes Prow Robot
1956f4e90d
Merge pull request #136701 from Jefftree/fix-tombstone
Add tombstone handling for serviceaccount and attachdetach controllers
2026-02-12 08:24:09 +05:30
Jefftree
334fa1cef8 Add tombstone handling for serviceaccount and attachdetach controllers 2026-02-11 16:06:29 -05:00
Kubernetes Prow Robot
fce5bc2854
Merge pull request #134316 from xigang/node_controller_pod
node_lifecycle_controller: fix processing deleted pod events, which are currently missed
2026-02-11 09:26:00 +05:30
Kubernetes Prow Robot
f693c45c4e
Merge pull request #136775 from atombrella/feature/activate_modernize_slicessort
Enable modernize/slicessort rule
2026-02-10 05:43:57 +05:30
Ryan Brewster
efe3667b6b
Add 404 handling for the statefulset controller pod deletion codepath
The daemonset controller already has handling for NotFound errors.

Right now if the statefulset controller is attempting to scale down a
statefulset and its informer cache is stale, it can get hard-blocked on a
missing pod.

This issue will eventually self-resolve once the informer cache "catches
up", but in the process of exploring this issue I realized that
404s during pod deletions don't strictly need to abort the entire sync;
we can continue.

This is especially impactful for large statefulsets with
podManagementStrategy: Parallel, where a single "phantom" pod (actually
missing, but still present in the informer cache) can block thousands of
other pods from being cleaned up.
2026-02-06 22:45:57 +00:00
Mads Jensen
95616cecda Use slices.Sort instead of sort.Slice.
There were only two instances of this in the entire code-base. Hence,
I have enabled the modernize rule/linter in golangci-lint.
2026-02-06 22:46:08 +01:00
carlory
4bc5464553
Remove feature gate HonorPVReclaimPolicy
Signed-off-by: carlory <baofa.fan@daocloud.io>
2026-02-06 13:31:16 +08:00
Kubernetes Prow Robot
eba75de156
Merge pull request #136341 from Karthik-K-N/remove-deprecated-methods
Remove usage of deprecated functions from ktesting package
2026-02-02 19:28:31 +05:30