kubernetes

mirror of https://github.com/kubernetes/kubernetes.git synced 2026-06-10 09:22:55 -04:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	59d65dad34	Merge pull request #134945 from tchap/kcm-controllers-check-threads pkg/controller: Improve goroutine management (part 2)	2025-11-06 00:43:01 -08:00
Kubernetes Prow Robot	50b4bcbab5	Merge pull request #134210 from yliaog/admit_quota DRA extended resource quota	2025-11-06 00:42:53 -08:00
Kubernetes Prow Robot	6723beac00	Merge pull request #135154 from kubernetes/revert-134840-ahmet/mini-cleanup Revert "controller: duplicate utility method cleanup"	2025-11-05 22:49:04 -08:00
Kubernetes Prow Robot	ca03752ee7	Merge pull request #135104 from mimowo/mutable-job-directives Allow mutable job scheduling directives on suspended Jobs	2025-11-05 21:57:11 -08:00
Kubernetes Prow Robot	f025bcace9	Merge pull request #135068 from pohly/dra-device-taints-1.35-full DRA device taint eviction: several improvements	2025-11-05 18:52:58 -08:00
yliao	870062df4f	adjusts DRA extended resource quota to include devices usages from regular resource claims	2025-11-05 23:24:24 +00:00
Maciej Szulik	499bff4ca4	Revert "controller: duplicate utility method cleanup"	2025-11-05 21:06:09 +01:00
Michał Woźniak	5a7c90fb76	Allow mutable scheduling directives for suspended Jobs	2025-11-05 19:37:33 +00:00
Patrick Ohly	60744fc8b9	DRA device taint eviction: track evicting rules This avoids having to call the rule lister (which theoretically, but not in practice) fail and having to iterate over rules which can be ignored (might be a small performance boost).	2025-11-05 20:03:17 +01:00
Patrick Ohly	9527987293	DRA device taint eviction: use NOP queue during simulation It's slightly more efficient and a bit cleaner.	2025-11-05 20:03:17 +01:00
Patrick Ohly	eaee6b6bce	DRA device taints: add separate feature gate for rules Support for DeviceTaintRules depends on a significant amount of additional code: - ResourceSlice tracker is a NOP without it. - Additional informers and corresponding permissions in scheduler and controller. - Controller code for handling status. Not all users necessarily need DeviceTaintRules, so adding a second feature gate for that code makes it possible to limit the blast radius of bugs in that code without having to turn off device taints and tolerations entirely.	2025-11-05 20:03:17 +01:00
Kubernetes Prow Robot	9ef1a14d68	Merge pull request #134840 from ahmetb/ahmet/mini-cleanup controller: duplicate utility method cleanup	2025-11-05 08:06:58 -08:00
Kubernetes Prow Robot	9a192aa1c3	Merge pull request #134432 from Karthik-K-N/fix-sv-test Fix storage version test flake	2025-11-05 06:56:52 -08:00
Ayato Tokubi	320987ead3	Addressed comments	2025-11-05 10:44:50 +00:00
Ayato Tokubi	5102591a6b	Refactor resource claim metrics to use structured labels and add "source" dimension. Signed-off-by: Ayato Tokubi <atokubi@redhat.com>	2025-11-05 09:52:47 +00:00
Kubernetes Prow Robot	c1a6a3ca71	Merge pull request #134152 from pohly/dra-device-taints-1.35 DRA: device taints: new ResourceSlice API, new features	2025-11-04 15:32:07 -08:00
Ondra Kupka	024382658b	controller/volume/vacprotection: Improve goroutine mgmt Make sure all threads are terminated when Run returns.	2025-11-04 23:58:15 +01:00
Ondra Kupka	e08d03b1b5	controller/volume/selinuxwarning: Improve goroutine mgmt Make sure all threads are terminated when Run returns.	2025-11-04 23:58:15 +01:00
Ondra Kupka	1e6ad423bf	controller/volume/pvprotection: Improve goroutine mgmt Make sure all threads are terminated when Run returns.	2025-11-04 23:58:15 +01:00
Ondra Kupka	0caae6f704	controller/volume/pvcprotection: Improve goroutine mgmt Make sure all threads are terminated when Run returns.	2025-11-04 23:58:15 +01:00
Ondra Kupka	ed74779a0f	controller/volume/persistentvolume: Improve goroutine mgmt Make sure all threads are terminated when Run returns.	2025-11-04 23:58:15 +01:00
Ondra Kupka	8eab454e38	controller/volume/expand: Improve goroutine mgmt Make sure all threads are terminated when Run returns.	2025-11-04 23:58:15 +01:00
Ondra Kupka	27774052ab	controller/volume/ephemeral: Improve goroutine mgmt Make sure all threads are terminated when Run returns.	2025-11-04 23:58:15 +01:00
Ondra Kupka	12205df76d	controller/volume/attachdetach: Improve goroutine mgmt Make sure all threads are terminated when Run returns.	2025-11-04 23:58:15 +01:00
Ondra Kupka	9d4ff6ecf2	controller/tainteviction: Improve goroutine mgmt Make sure all threads are terminated when Run returns.	2025-11-04 23:58:15 +01:00
Ondra Kupka	d2a443db75	controller/serviceaccount: Improve goroutine mgmt Make sure all threads are terminated when Run returns.	2025-11-04 23:58:15 +01:00
Ondra Kupka	c641df792b	controller/resourcequota: Improve goroutine mgmt Make sure all threads are terminated when Run returns.	2025-11-04 23:58:15 +01:00
Ondra Kupka	d908a470a5	controller/garbagecollector: Improve goroutine mgmt Make sure all threads are terminated when Run returns.	2025-11-04 23:58:15 +01:00
Kubernetes Prow Robot	97cb47a913	Merge pull request #135080 from dejanzele/feat/promote-job-managedby-to-ga KEP-4368: Job Managed By; Promote to GA	2025-11-04 13:42:12 -08:00
Patrick Ohly	bbf8bc766e	DRA device taints: DeviceTaintRule status To update the right statuses, the controller must collect more information about why a pod is being evicted. Updating the DeviceTaintRule statuses then is handled by the same work queue as evicting pods. Both operations already share the same client instance and thus QPS+server-side throttling, so they might as well share the same work queue. Deleting pods is not necessarily more important than informing users or vice-versa, so there is no strong argument for having different queues. While at it, switching the unit tests to usage of the same mock work queue as in staging/src/k8s.io/dynamic-resource-allocation/internal/workqueue. Because there is no time to add it properly to a staging repo, the implementation gets copied.	2025-11-04 21:57:24 +01:00
Patrick Ohly	0689b628c7	generated files	2025-11-04 21:57:24 +01:00
Patrick Ohly	f4a453389d	DRA device taint eviction: configurable number of workers It might never be necessary to change the default, but it is hard to be sure. It's better to have the option, just in case.	2025-11-04 21:57:24 +01:00
Kubernetes Prow Robot	a058cf788a	Merge pull request #134624 from yt2985/podcertificates-beta Promote Pod Certificates feature to beta	2025-11-04 11:42:12 -08:00
Dejan Zele Pejchev	3dabd4417d	KEP-4368: Job Managed By; Promote to GA Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>	2025-11-04 10:59:45 +01:00
Kubernetes Prow Robot	d6aa2db57e	Merge pull request #135027 from omerap12/remove-reactor-hpa Remove unused delete reactor	2025-11-04 01:30:10 -08:00
Kubernetes Prow Robot	48c56e04e0	Merge pull request #135017 from liggitt/stateful-set-noop-rollout Fix spurious statefulset rollout from 1.33 → 1.34	2025-11-03 19:58:11 -08:00
Kubernetes Prow Robot	41673c7198	Merge pull request #134910 from tchap/kcm-controllers-thread-mgmt pkg/controller: Improve goroutine management	2025-11-03 17:58:03 -08:00
Jordan Liggitt	979c442774	Fix spurious workload rollout due to null creationTimestamp in controller revisions	2025-11-03 17:11:06 -05:00
Jordan Liggitt	7d186d870f	Remove unused and fragile revision hash comparisons This was broken since `666a41c2ea` when the label value became non-integer encoded The chance of one controller revision hash label being int-parsable: 7/27 ^ 8 = 0.00002041 = ~0 The chance of both being int-parsable: 0.00002041^2 = ~0 Hash comparison locks in differences in content failing EqualRevision even when the semantic content is normalized to be equal.	2025-11-03 16:33:40 -05:00
Jordan Liggitt	94e085e15c	Add unit test detecting spurious statefulset rollout	2025-11-03 16:33:39 -05:00
Lukasz Szaszkiewicz	c832203707	pkg/controller/garbagecollector/garbagecollector_test: wrap kubeClient with a client that doesn't support WatchList semantics.	2025-11-03 10:41:49 +01:00
tinatingyu	59e075e8d3	Promote PodCertificateRequests to v1beta1	2025-11-02 05:33:44 +00:00
Omer Aplatony	264eab46db	Remove unused delete reactor Signed-off-by: Omer Aplatony <omerap12@gmail.com>	2025-11-01 06:13:40 +00:00
Patrick Ohly	c69259cb71	DRA device taints: switch to workqueue in controller The approach copied from node taint eviction was to fire off one goroutine per pod the intended time. This leads to the "thundering herd" problem: when a single taint causes eviction of several pods and those all have no or the same toleration grace period, then they all get deleted concurrently at the same time. For node taint eviction that is limited by the number of pods per node, which is typically ~100. In an integration test, that already led to problems with watchers: cacher.go:855] cacher (pods): 100 objects queued in incoming channel. cache_watcher.go:203] Forcing pods watcher close due to unresponsiveness: key: "/pods/", labels: "", fields: "". len(c.input) = 10, len(c.result) = 10, graceful = false It also causes spikes in memory consumption (mostly the 2KB stack per goroutine plus closure) with no upper limit. Using a workqueue makes concurrency more deterministic because there is an upper limit. In the integration test, 10 workers kept the watch active. Another advantage is that failures to evict the pod get retried with exponential backoff per affected pod forever. Previously, evicting was tried a few times with a fixed rate and then the controller gave up. If the apiserver was down long enough, pods didn't get evicted.	2025-10-31 18:11:19 +01:00
Patrick Ohly	e5fcd20a26	DRA device taints: tighten controller test We know how often the controller should get a pod, let's check it. Must run before we do our own GET call.	2025-10-31 18:11:18 +01:00
Patrick Ohly	6ebd853f17	DRA: implementation of none taint effect While at it, ensure that future unknown effects are treating like the None effect.	2025-10-31 18:11:18 +01:00
Patrick Ohly	e4dda7b282	DRA device taints: fix DeviceTaintRule + missing slice case When the ResourceSlice no longer exists, the ResourceSlice tracker didn't and couldn't report the tainted devices even if they are allocated and in use. The controller must keep track of DeviceTaintRules itself and handle this scenario. In this scenario it is impossible to evaluation CEL expressions because the necessary device attributes aren't available. We could: - Copy them in the allocation result: too large, big change. - Limit usage of CEL expressions to rules with no eviction: inconsistent. - Remove the fields which cannot be supported well. The last option is chosen. The tracker is now no longer needed by the eviction controller. Reading directly from the informer means that we cannot assume that pointers are consistent. We have to track ResourceSlices by their name, not their pointer.	2025-10-31 18:11:18 +01:00
Patrick Ohly	2e543d151b	DRA device taints: convert unit test to synctest The immediate benefit is that the time required for running the package's unit test goes down from ~10 seconds (because of required real-world delays) to ~0.5 seconds (depending on the CPU performance of the host). It can also make writing tests easier because after a `Wait` there is no need for locking before accessing internal state (all background goroutines are known to be blocked waiting for the main goroutine). What somewhat ruins the perfect determinism is the polling for informer cache syncs: that can take an unknown number of loop iterations. Probably could be fixed by making the waiting block on channels (requires work in client-go). The only change required in the implementation is avoiding the sleep when deleting a pod failed for the last time in the loop (a useful, albeit minor improvement by itself): the test proceeds after having blocked that last Delete call, in which case synctest expects the background goroutine to exit without delay.	2025-10-30 17:29:58 +01:00
Kubernetes Prow Robot	808d320de1	Merge pull request #134956 from yliaog/blockowner removed BlockOwnerDeletion	2025-10-30 01:26:11 -07:00
yliao	4f647b3f3d	removed BlockOwnerDeletion	2025-10-29 22:41:10 +00:00

1 2 3 4 5 ...

7061 commits