kubernetes

mirror of https://github.com/kubernetes/kubernetes.git synced 2026-04-22 22:58:08 -04:00

Author	SHA1	Message	Date
Jon Huhn	d80f384b70	Workload API: PodGroup ResourceClaims (KEP-5729)	2026-03-22 14:52:45 -05:00
Patrick Ohly	566dc7f3f3	DRA device taints: graduate to beta The fields become beta, enabled by default. DeviceTaintRule gets added to the v1beta2 API, but support for it must remain off by default because that API group is also off by default. The v1beta1 API is left unchanged. No-one should be using it anymore (deprecated in 1.33, could be removed now if it wasn't for reading old objects and version emulation). To achieve consistent validation, declarative validation must be enabled also for v1alpha3 (was already enabled for other versions). Otherwise, TestVersionedValidationByFuzzing fails: --- FAIL: TestVersionedValidationByFuzzing (0.09s) --- FAIL: TestVersionedValidationByFuzzing/resource.k8s.io/v1beta2,_Kind=DeviceTaintRule (0.00s) validation_test.go:109: different error count (0 vs. 1) resource.k8s.io/v1alpha3: <no errors> resource.k8s.io/v1beta2: "spec.taint.effect: Unsupported value: \"幤HxÒQP¹¬永唂ȳ垞ş]嘨鶊\": supported values: \"NoExecute\", \"NoSchedule\", \"None\"" ...	2026-03-12 18:26:02 +01:00
Patrick Ohly	29e92367db	DRA device taints: avoid unnecessary Pod lookup When rapidly processing informer events it can happen that a pod gets scheduled twice (seen only in the TestEviction/update unit test): - Claim update observed, pod from informer cache with NodeName from update -> queue pod for eviction. - Pod update observed, claim from informer cache -> queue pod again. The effect is one additional Get call to the apiserver. We can avoid it by maintaining an LRU cache with the UIDs of the pods which we have evicted and thus don't need to do anything for.	2026-02-27 14:38:30 +01:00
Patrick Ohly	017a53a1a9	DRA device taints: simplify more tests with synctest In these cases it's certain that no time needs to pass, so Wait can replace polling with Eventually. This also means that locking is not necessary to prevent data races.	2026-02-27 07:47:28 +01:00
Patrick Ohly	4521c34276	DRA device taints: remove usage of testify for unit test In particular with the builtin tCtx.Assert/Expect the assertions are also short when using gomega and often more readable (no more confusion in Equal which one is the expected and which the actual value).	2026-02-27 07:47:28 +01:00
Patrick Ohly	fb94a99d2f	DRA device taints: artificially delay pod deletion during test We can observe the delay in the metric histogram. Because we run in a synctest bubble, the delay is 100% predictable. Unfortunately we cannot use the reactor mechanism of the fake client: that delays while holding the fake's mutex. When some other goroutine (in this case, the event recorder) calls the client, it gets blocked without being considered durably blocked by synctest, so time does not advance and the test gets stuck.	2026-02-27 07:47:28 +01:00
Patrick Ohly	7d7b4c3dcb	DRA device taint tests: remove List+Watch workaround This was fixed in client-go itself, no workaround needed anymore.	2026-02-27 07:46:33 +01:00
Patrick Ohly	75626bcf3f	DRA device taints: update unit tests Thanks for waiting for cache sync via channels the random delays caused by polling are gone, making the initial setup including cache sync happen "immediately" when a test starts (= same virtual time). This makes the tests more predictable and simplifies making further assertions about when something happens or how long it takes. While at it, restore previous performance by setting feature gates once and running tests in parallel again.	2026-02-27 07:46:19 +01:00
Karthik Bhat	85aceb54fd	Remove fmt.Println() added for debug	2026-01-22 18:34:15 +05:30
Karthik Bhat	8962f08815	Remove deprecated test methods	2026-01-12 16:15:04 +05:30
Karthik Bhat	cd7d35fa3d	Fix flake TestDeviceTaintRule test by adjusting event hanlder status update logic Co-authored-by: Pohly <patrick.ohly@intel.com>	2026-01-06 11:00:06 +05:30
Kubernetes Prow Robot	31fb6f64ef	Merge pull request #135821 from pohly/dra-device-taints-owner DRA device taints controller: add pohly to OWNERS	2025-12-18 19:24:38 -08:00
Patrick Ohly	9194bfe75b	DRA device taints controller: add pohly to OWNERS While the code is nominally owned by SIG Scheduling, in practice I am the one who knows it best, so I should be a reviewer and should be able to merge simple changes without additional approvals (will use cautiously!).	2025-12-18 12:07:52 +01:00
Patrick Ohly	b2151b1f51	DRA device taints: fix and simplify unit tests Using `t` instead of `tCtx` is subtly wrong: the failure is attributed to the parent test, not the sub-test. Using a separate function with tCtx as parameter ensures that t is not in scope of the code and thus this mistake cannot happen. The number of lines is the same, it's just a bit more code. For TestRetry another advantage is the reduced indention. It's worth calling out that the same cannot be done for benchmarks: - They need methods (Loop) or fields (N) which are not exposed by TContext. - The `for b.Loop()` pattern only works if the for loop is written exactly like that.	2025-12-05 19:13:55 +01:00
Patrick Ohly	60744fc8b9	DRA device taint eviction: track evicting rules This avoids having to call the rule lister (which theoretically, but not in practice) fail and having to iterate over rules which can be ignored (might be a small performance boost).	2025-11-05 20:03:17 +01:00
Patrick Ohly	9527987293	DRA device taint eviction: use NOP queue during simulation It's slightly more efficient and a bit cleaner.	2025-11-05 20:03:17 +01:00
Patrick Ohly	eaee6b6bce	DRA device taints: add separate feature gate for rules Support for DeviceTaintRules depends on a significant amount of additional code: - ResourceSlice tracker is a NOP without it. - Additional informers and corresponding permissions in scheduler and controller. - Controller code for handling status. Not all users necessarily need DeviceTaintRules, so adding a second feature gate for that code makes it possible to limit the blast radius of bugs in that code without having to turn off device taints and tolerations entirely.	2025-11-05 20:03:17 +01:00
Patrick Ohly	bbf8bc766e	DRA device taints: DeviceTaintRule status To update the right statuses, the controller must collect more information about why a pod is being evicted. Updating the DeviceTaintRule statuses then is handled by the same work queue as evicting pods. Both operations already share the same client instance and thus QPS+server-side throttling, so they might as well share the same work queue. Deleting pods is not necessarily more important than informing users or vice-versa, so there is no strong argument for having different queues. While at it, switching the unit tests to usage of the same mock work queue as in staging/src/k8s.io/dynamic-resource-allocation/internal/workqueue. Because there is no time to add it properly to a staging repo, the implementation gets copied.	2025-11-04 21:57:24 +01:00
Patrick Ohly	0689b628c7	generated files	2025-11-04 21:57:24 +01:00
Patrick Ohly	f4a453389d	DRA device taint eviction: configurable number of workers It might never be necessary to change the default, but it is hard to be sure. It's better to have the option, just in case.	2025-11-04 21:57:24 +01:00
Patrick Ohly	c69259cb71	DRA device taints: switch to workqueue in controller The approach copied from node taint eviction was to fire off one goroutine per pod the intended time. This leads to the "thundering herd" problem: when a single taint causes eviction of several pods and those all have no or the same toleration grace period, then they all get deleted concurrently at the same time. For node taint eviction that is limited by the number of pods per node, which is typically ~100. In an integration test, that already led to problems with watchers: cacher.go:855] cacher (pods): 100 objects queued in incoming channel. cache_watcher.go:203] Forcing pods watcher close due to unresponsiveness: key: "/pods/", labels: "", fields: "". len(c.input) = 10, len(c.result) = 10, graceful = false It also causes spikes in memory consumption (mostly the 2KB stack per goroutine plus closure) with no upper limit. Using a workqueue makes concurrency more deterministic because there is an upper limit. In the integration test, 10 workers kept the watch active. Another advantage is that failures to evict the pod get retried with exponential backoff per affected pod forever. Previously, evicting was tried a few times with a fixed rate and then the controller gave up. If the apiserver was down long enough, pods didn't get evicted.	2025-10-31 18:11:19 +01:00
Patrick Ohly	e5fcd20a26	DRA device taints: tighten controller test We know how often the controller should get a pod, let's check it. Must run before we do our own GET call.	2025-10-31 18:11:18 +01:00
Patrick Ohly	6ebd853f17	DRA: implementation of none taint effect While at it, ensure that future unknown effects are treating like the None effect.	2025-10-31 18:11:18 +01:00
Patrick Ohly	e4dda7b282	DRA device taints: fix DeviceTaintRule + missing slice case When the ResourceSlice no longer exists, the ResourceSlice tracker didn't and couldn't report the tainted devices even if they are allocated and in use. The controller must keep track of DeviceTaintRules itself and handle this scenario. In this scenario it is impossible to evaluation CEL expressions because the necessary device attributes aren't available. We could: - Copy them in the allocation result: too large, big change. - Limit usage of CEL expressions to rules with no eviction: inconsistent. - Remove the fields which cannot be supported well. The last option is chosen. The tracker is now no longer needed by the eviction controller. Reading directly from the informer means that we cannot assume that pointers are consistent. We have to track ResourceSlices by their name, not their pointer.	2025-10-31 18:11:18 +01:00
Patrick Ohly	2e543d151b	DRA device taints: convert unit test to synctest The immediate benefit is that the time required for running the package's unit test goes down from ~10 seconds (because of required real-world delays) to ~0.5 seconds (depending on the CPU performance of the host). It can also make writing tests easier because after a `Wait` there is no need for locking before accessing internal state (all background goroutines are known to be blocked waiting for the main goroutine). What somewhat ruins the perfect determinism is the polling for informer cache syncs: that can take an unknown number of loop iterations. Probably could be fixed by making the waiting block on channels (requires work in client-go). The only change required in the implementation is avoiding the sleep when deleting a pod failed for the last time in the loop (a useful, albeit minor improvement by itself): the test proceeds after having blocked that last Delete call, in which case synctest expects the background goroutine to exit without delay.	2025-10-30 17:29:58 +01:00
Patrick Ohly	6f51446802	DRA device taints: fix toleration of NoExecute As usual, consumers of an allocated claim react to the information stored in the status. In this case, the scheduler did not copy the tolerations into the status and as a result a pod with a toleration for NoExecute got scheduled and then immediately evicted. Some additional logging gets added to make the handling easier to track in the eviction controller. Example YAMLs allow reproducing the use case manually.	2025-10-08 13:13:47 +02:00
Sunyanan Choochotkaew	7f052afaef	KEP 5075: implement scheduler Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>	2025-07-30 09:52:49 +09:00
Patrick Ohly	5c4f81743c	DRA: use v1 API As before when adding v1beta2, DRA drivers built using the k8s.io/dynamic-resource-allocation helper packages remain compatible with all Kubernetes release >= 1.32. The helper code picks whatever API version is enabled from v1beta1/v1beta2/v1. However, the control plane now depends on v1, so a cluster configuration where only v1beta1 or v1beta2 are enabled without the v1 won't work.	2025-07-24 08:33:45 +02:00
Kubernetes Prow Robot	0617903e9d	Merge pull request #131344 from pohly/dra-taint-unit-test-flake-minimal DRA: work around fake.ClientSet informer deficiency in unit test	2025-07-03 02:51:25 -07:00
Davanum Srinivas	03afe6471b	Add a replacement for cmp.Diff using json+go-difflib Co-authored-by: Jordan Liggitt <jordan@liggitt.net> Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2025-06-16 17:10:42 -04:00
Patrick Ohly	ff108e72a5	DRA device taints: fix rare unit test flake TestCancelEviction flaked with a 0,01% rate because assumed that an event had already been created once the pod was updated, but that was only true under some timing conditions.	2025-04-17 17:16:23 +02:00
Patrick Ohly	ff2e6dddc8	DRA device taints: work around fake.ClientSet informer race fake.Clientset suffers from a race condition related to informers: it does not implement resource version support in its Watch implementation and instead assumes that watches are set up before further changes are made. If a test waits for caches to be synced and then immediately adds an object, that new object will never be seen by event handlers if the race goes wrong and the Watch call hadn't completed yet (can be triggered by adding a sleep before `b53b9fb557/staging/src/k8s.io/client-go/tools/cache/reflector.go (L431)`). To work around this, we count all watches and only proceed when all of them are in place. This replaces the normal watch reactor (`b53b9fb557/staging/src/k8s.io/client-go/kubernetes/fake/clientset_generated.go (L161-L173)`).	2025-04-17 10:57:27 +02:00
Patrick Ohly	638abf0339	DRA device taints: more logging in test	2025-04-17 10:55:13 +02:00
Patrick Ohly	40f2085d68	DRA device taint: clean up test initialization The creation of the shared informer factory and starting it can be done all in the same function, which makes it a bit more obvious what happens in which order and avoids some code duplication.	2025-04-17 10:55:13 +02:00
Patrick Ohly	56adcd06f3	DRA device eviction: fix eviction triggered by pod scheduling Normally the scheduler shouldn't schedule when there is a taint, but perhaps it didn't know yet. The TestEviction/update test covered this, but only failed under the right timing conditions. The new event handler test case covers it reliably.	2025-03-20 19:49:54 +01:00
Patrick Ohly	5856d3ee6f	DRA taint eviction: fix waiting in unit test Events get recorded in the apiserver asynchronously, so even if the test knows that the event has been evicted because the pod is deleted, it still has to also check for the event to be recorded. This caused a flake in the "Consistently" check of events.	2025-03-20 17:59:48 +01:00
Patrick Ohly	ac6e47cb14	DRA taint eviction: improve error handling There was one error path that led to a "controller has shut down" log message. Other errors caused different log entries or are so unlikely (event handler registration failure!) that they weren't checked at all. It's clearer to let Run return an error in all cases and then log the "controller has shut down" error at the call site. This also enables tests to mark themselves as failed, should that ever happen.	2025-03-20 17:59:06 +01:00
Patrick Ohly	9f161590be	metrics testing: add type aliases to avoid direct prometheus imports In tests it is sometimes unavoidable to use the Prometheus types directly, for example when writing a custom gatherer which needs to normalize data before testing it. device_taint_eviction_test.go does this to strip out unpredictable data in a histogram. With type aliases in a package that is explicitly meant for tests we can avoid adding exceptions for such tests to the global exception list.	2025-03-19 09:18:38 +01:00
Patrick Ohly	a027b439e5	DRA: add device taint eviction controller The controller is derived from the node taint eviction controller. In contrast to that controller it tracks the UID of pods to prevent deleting the wrong pod when it got replaced.	2025-03-19 09:18:38 +01:00
Patrick Ohly	13d04d4a92	DRA device taints: copy taintseviction controller This is a verbatim copy of the current pkg/controller/taintseviction code, revision `fc268ecd09` (v1.33.0 plus one commit), minus the TimedWorker helper. The intent is to modify the code such that it enforces eviction of pods which use tainted devices.	2025-03-18 20:52:54 +01:00

40 commits