kubernetes

mirror of https://github.com/kubernetes/kubernetes.git synced 2026-05-22 18:08:54 -04:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	874a7b40b0	Merge pull request #138617 from esotsal/kubeletHealthCheckRefactor Move kubeletHealthCheck from e2enode to node as HealthCheck	2026-05-12 02:26:10 +05:30
Kubernetes Prow Robot	5cf56a97d5	Merge pull request #138851 from saschagrunert/fix/container-metrics-flake Fix ContainerMetrics cadvisor test flake for block I/O metrics	2026-05-10 18:37:47 +05:30
Sotiris Salloumis	20c57876a4	Increase bound CPU limit to 2e+10 to fix admission api flaky test. After replacing the command to increase UsageNanoCores, to fix a previous flaky test, in some test environments, UsageNanoCores exceeds the limit 2e+09, this commit attempts to fix this by ncreasing UsageNanoCores limit to 2e+10.	2026-05-09 09:46:23 +02:00
Kubernetes Prow Robot	4818833ecc	Merge pull request #138820 from esotsal/fix-sriov-cpumanager Fix podresources flaky test: wait for Pod Resources V1 serving in flaky test	2026-05-08 00:05:18 +05:30
Sascha Grunert	ee9f8c6bde	Fix ContainerMetrics cadvisor test flakes Replace the small echo write with a dd that uses conv=fsync to force data through the block layer. Without fsync, the 11-byte echo writes stay in page cache and never reach the block device within the 60-second test window. This leaves the cgroup io.stat empty, so cadvisor does not emit container_blkio_device_usage_total, container_fs_reads_bytes_total, or container_fs_writes_bytes_total for the container. The conv=fsync call guarantees block device I/O on every loop iteration. Once io.stat has an entry for a device, all fields (rbytes, wbytes, rios, wios) are present, even if zero, so all cadvisor metrics pass their boundedSample(0, ...) checks. Also increase the UsageCoreNanoSeconds upper bound from 1e11 to 1e12 for the container and pod-level CPU checks. The cumulative CPU time can exceed 100s on slower architectures like ppc64le where the dd CPU burner loop accumulates faster than expected. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2026-05-07 15:01:02 +02:00
Kubernetes Prow Robot	d92b8fe8f2	Merge pull request #138739 from zxqlxy/device-plugin-slow-register Add e2e test for device plugin slow register	2026-05-07 11:42:31 +05:30
Sotiris Salloumis	acabaa7d50	Fix podresources flaky test: wait for Pod Resources V1 serving in flaky test One podresources test, was not waiting for Pod Resources V1 to be serving. This can lead to flaky tests in a next step. This change attempts to fix this flaky test, by adding waitForPodResourcesV1Serving(ctx) as done on remaining tests. In addition ExpectNoError was added to all closing connection attempts, to improve troubleshooting.	2026-05-07 05:35:17 +02:00
Xinyun Liu	62e23b9857	Add E2E test for multiple device plugin and second one is struggle to register	2026-05-06 23:48:32 +00:00
Paco Xu	11d08fcb7f	Revert "remove flaky label in SRIOV related tests"	2026-05-06 17:11:33 +08:00
Sotiris Salloumis	5486715fbf	Move kubeletHealthCheck from e2enode to node To reduce duplication of code and overcome import cycle not allowed error during compile time, when used in non e2e_node packages.	2026-05-05 20:39:07 +02:00
Kubernetes Prow Robot	d2b48c52df	Merge pull request #138716 from lukaszwojciechowski/fix-sriov-teardown fix: SRIOV resources cleanup in runTMScopeResourceAlignmentTestSuite	2026-05-05 14:16:21 +05:30
Kubernetes Prow Robot	43f4e90bee	Merge pull request #138755 from saschagrunert/fix-crio-conformance-container-metrics Replace openssl speed CPU burner in summary_test.go	2026-05-05 13:02:29 +05:30
Kubernetes Prow Robot	f8535a28a6	Merge pull request #138462 from shachartal/fix/sidecar-ephemeral-storage-eviction kubelet: enforce ephemeral-storage limits on restartable init containers	2026-05-04 23:44:23 +05:30
Sascha Grunert	e7bc0479c0	Replace openssl speed CPU burner in summary_test.go The openssl speed command added in #138423 causes the ContainerMetrics cadvisor test to fail on CRI-O conformance (ci-node-crio-conformance) by exceeding upper bounds for container_fs_writes_total, container_blkio_device_usage_total and container_memory_failures_total. Replace with a lightweight dd-based CPU burner that generates CPU load via syscall overhead without filesystem I/O or memory side effects. Revert the bound changes to pre-#138423 values. The underlying UsageNanoCores issue is better addressed at the kubelet level by #138687. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2026-05-04 15:31:18 +02:00
Sotiris Salloumis	c306df361c	Fix Summary API resource usage test Make the process in the container more cpu intensive to make sure we catch CPU usage more than nanocore, within the test window to overcome a known limitation in older containerd versions. Increase UsageNanoCores and UsageCoreNanoSeconds boundaries, to cater for the additional cpu loads.	2026-05-03 18:31:43 +02:00
Kubernetes Prow Robot	10104afcde	Merge pull request #138289 from esotsal/fix-device-plugin-test-flaky Fix pull-kubernetes-node-kubelet-serial-containerd flaky tests	2026-05-03 05:21:23 +05:30
Kubernetes Prow Robot	bb0bcb8a85	Merge pull request #135437 from zxqlxy/device-plugin-fix DRA-like fix for device-plugin race condition problem	2026-05-02 10:01:24 +05:30
Xinyun Liu	ce680fea20	Add E2E test for multiple device-plugins scenarios	2026-05-01 21:35:15 +00:00
Kubernetes Prow Robot	487cdf46c8	Merge pull request #138642 from stlaz/ensure-node-e2e-investigate e2e node conformance: fix the EnsureCredentialPulledImages test flakes	2026-05-01 03:03:32 +05:30
Sotiris Salloumis	04aa178a5a	Fix pull-kubernetes-node-kubelet-serial-containerd flaky tests Improve testdeviceplugin to healthcheck kubelet and fail early if kubelet is not healthy. Check sampledeviceplugin pod logs, and perform manual registration only after the container has entered the registration loop. Add printouts of sampledeviceplugin pod after each device-plugin-test test, for troubleshooting. Fix flaky test upon failed admission in device_plugin_test, ensuring containers are stopped, and then by checking first that number of device plugins are one before checking the containers matching devices. Fix Resources API SRIOV flaky test, by cleaning up pods Before Each test step. Clean up pod-stress and memory-qos test pods AfterEach test step.	2026-04-30 20:22:26 +02:00
Lukasz Wojciechowski	100c351b4b	fix: use DeferCleanup for SRIOV resources in runTMScopeResourceAlignmentTestSuite The test was calling teardownSRIOVConfigOrFail at the end of the function, which meant resources would not be cleaned up if any test failed midway. Using ginkgo.DeferCleanup ensures proper cleanup even on test failure.	2026-04-30 16:59:55 +02:00
Stanislav Láznička	a8ab1bc19c	e2e node conformance: restart the kubelet after removing the image_manager dir to recover it Signed-off-by: Stanislav Láznička <slznika@microsoft.com>	2026-04-29 10:43:13 +02:00
Stanislav Láznička	cdfc943823	e2e node conformance: print error and directory listing on image record stat failure Signed-off-by: Stanislav Láznička <slznika@microsoft.com>	2026-04-29 10:24:50 +02:00
zak905	04286814e7	clean up: remove loop variable capture	2026-04-28 23:53:27 +02:00
Shachar Tal	db2380e4c1	Apply suggestions from code review Co-authored-by: Bing Hongtao <695097494plus@gmail.com>	2026-04-28 10:57:53 +03:00
Kubernetes Prow Robot	75d51c4407	Merge pull request #138258 from pohly/ktesting-cgo ktesting dependencies	2026-04-27 06:48:46 +05:30
Kubernetes Prow Robot	e8cb34c6d8	Merge pull request #138322 from willie-yao/pid-flake Fix flaky hostPID security context test by retrying nginx PID file read	2026-04-25 07:00:59 +05:30
Kubernetes Prow Robot	03153864cf	Merge pull request #137930 from rata/userns-idsPerPod-test-fixes tests: Wait for pod to be removed on kubelet restarts with userns.idsPerPod	2026-04-25 07:00:52 +05:30
Kubernetes Prow Robot	9234064eda	Merge pull request #137627 from hoteye/pr-nodelease-graceful-shutdown-test test/e2e_node: cover node lease renewal during graceful shutdown	2026-04-25 07:00:45 +05:30
Kubernetes Prow Robot	fea119171f	Merge pull request #138242 from pacoxu/sriov-test remove flaky label in SRIOV related tests	2026-04-25 04:45:09 +05:30
Kubernetes Prow Robot	9af59744b6	Merge pull request #138200 from rpb-ant/rpb/podresources-skip-leak e2e_node: podresources: skip cpuAlloc check in BeforeEach, not JustBeforeEach	2026-04-25 04:45:01 +05:30
Patrick Ohly	84190acdaa	ktesting: move format package The format package is used by ktesting, both to reconfigure Gomega and to format errors, therefore it has to be moved to staging together with ktesting, if or when we get to that because those are desirable features. Because format only has the YAML package as additional dependency and that should be okay for all other repos (except for the YAML package itself, of course), we can publish the format package as a sub-package of such a future ktesting module. Avoiding the dependency on apimachinery to detect unstructured.Unstructured is a bit tricky, but doable by relaxing what we check for. The test/utils/format package is kept to test ktesting/format with the actual packages that it cannot depend on (apimachinery, api).	2026-04-24 21:54:19 +02:00
Kubernetes Prow Robot	c2b57ba319	Merge pull request #138135 from HirazawaUi/add-more-e2e-tests-for-kep-4781 Add e2e test to ensure that the NotReady pod status does not change after kubelet restart	2026-04-23 20:00:45 +05:30
Kubernetes Prow Robot	ce14ead9b2	Merge pull request #138253 from HirazawaUi/remove-duplicate-kubelet-health-checks E2E_Node: Remove duplicate kubelet health checks	2026-04-23 09:36:51 +05:30
Kubernetes Prow Robot	679a271800	Merge pull request #138143 from dims/fix-cri-proxy-event-stream test/e2e_node: fix CRI proxy event forwarding	2026-04-23 05:12:12 +05:30
William Yao	0068e4149c	Fix flaky hostPID security context test by retrying nginx PID file read Signed-off-by: William Yao <william2000yao@gmail.com>	2026-04-22 10:30:56 -07:00
Qi Wang	2aaa5b654b	skip MemoryQoS rollback test until implementation is resolved skip MemoryQoS rollback test until we figure out the mechanism to rollback. Signed-off-by: Qi Wang <qiwan@redhat.com>	2026-04-20 12:41:45 -04:00
Shachar Tal	d7f380f7e3	kubelet: enforce ephemeral-storage limits on restartable init containers containerEphemeralStorageLimitEviction() only iterated pod.Spec.Containers when building the per-container ephemeral-storage threshold map. Restartable init containers (sidecars) were never checked against their declared limit, allowing them to exceed it indefinitely without triggering eviction. Include restartable init containers in the threshold map so the existing per-container comparison covers them.	2026-04-19 15:48:51 +03:00
Dylan liu	796856658c	test/e2e_node: cover node lease renewal during graceful shutdown Add a dedicated graceful shutdown e2e_node case to verify that the node lease continues to renew while shutdown is active. The test uses an extended shutdown window, configures the kubelet lease cadence explicitly, waits for the node to report Ready=False with reason KubeletNotReady, and then checks that the lease renewTime advances multiple times before shutdown completes.	2026-04-14 14:50:19 +08:00
Rodrigo Campos	a138a4825e	tests: Wait for pod to be removed on kubelet restart with idsPerPod The test starts the kubelet with a non-default setting for idsPerPod, runs a pod, deletes it, and then restarts the kubelet. The issue is that the kubelet guarantees that no two pods userns mappings overlap (for security reasons). But we are not waiting for the pod to be removed, the deleteSync() call only waits for the API server to remove the pod. So, the pod is on disk (and maybe even running!) when we restart the kubelet. As the previous configuration is incompatible with the new one after restart if pods are running, the kubelet failing is the right thing. We should just wait for the pod to be deleted from the kubelet too, before restarting it with an incompatible configuration. So, this commit just changes the pod deleteion (before done in e2eoutput.TestContainerOutput() just waiting for the API server) to wait for the kubelet to delete the pod. Signed-off-by: Rodrigo Campos <rodrigo@amutable.com>	2026-04-09 11:45:22 +02:00
Ryan Brewster	67ffec43e4	e2e_node: podresources: skip cpuAlloc check in BeforeEach, not JustBeforeEach When the cpuAlloc check at podresources_test.go:1358 fires, it Skip()s from a JustBeforeEach at the outer When() level. By that point the inner When()'s tempSetCurrentKubeletConfig BeforeEach has already rewritten the kubelet config (including, for the "restricted list output disabled" block, setting KubeletPodResourcesListUseActivePods=false). Ginkgo only runs AfterEach hooks at-or-shallower than the node where Skip() fired (internal/group.go:252), so the inner AfterEach that would restore the kubelet config is never invoked. The leaked feature gate then propagates to every subsequent serial test, which breaks device_plugin_test.go's "Does not keep device plugin assignments across node reboots if fails admission" on e2-standard-2 nodes. Moving the check to BeforeEach makes it fire before the inner BeforeEach runs, so the config is never written. This matches the identical check at podresources_test.go:1120. Signed-off-by: Ryan Brewster <rpb@anthropic.com>	2026-04-08 17:19:15 +00:00
HirazawaUi	9f19fc42b5	Remove duplicate kubelet health checks	2026-04-07 22:58:37 +08:00
HirazawaUi	e59a6e7726	Add e2e test to ensure that the NotReady pod status does not change after kubelet restart	2026-04-07 22:13:08 +08:00
Paco Xu	287dbcf12a	remove flaky label in SRIOV related tests	2026-04-07 09:30:26 +08:00
yashsingh74	afdb5e5d1f	Update CNI plugins to v1.9.1 Signed-off-by: yashsingh74 <yashsingh1774@gmail.com>	2026-04-01 14:06:34 +05:30
Davanum Srinivas	c2f0180463	test/e2e_node: fix CRI proxy event forwarding The CRI proxy called GetContainerEvents synchronously, which blocked in the upstream receive loop and prevented kubelet from receiving container lifecycle events. With AllAlpha enabled, that breaks the EventedPLEG path and leaves the restart and image-pull retry tests dependent on delayed fallback relists. Run the upstream event stream in a goroutine, tie it to the downstream stream context, and propagate non-cancellation errors after forwarding completes. Also restore the image-volume test to look for the kubelet log message emitted when Image.Image is empty.	2026-03-31 18:44:22 -04:00
Davanum Srinivas	10efa46fbb	e2e_node: wait for pod drain before asserting zero pods in Memory Manager Metrics The Memory Manager Metrics BeforeEach asserts that zero pods are running on the node after a kubelet config update. This hard assertion flakes when a preceding serial test's namespace deletion hasn't completed yet — framework namespace cleanup is async and the kubelet restart in updateKubeletConfig can delay in-flight pod termination. CI logs show leftover pods from MemoryQoS tests (memqos-burstable, memqos-no-limit, etc.), Probe Stress tests (50-container pods), and Summary API PSI tests (memory-pressure-pod), all still Running when the assertion fires 4-7ms after the previous test finishes. Replace the immediate Expect(count).To(BeZero()) with an Eventually poll (2 minute timeout, 5 second interval) that gives pods time to drain after the kubelet restart. The existing printAllPodsOnNode diagnostic output is preserved inside the poll for debugging. Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2026-03-28 15:27:25 -04:00
Kubernetes Prow Robot	c6a95ffd4c	Merge pull request #137996 from pacoxu/inplace-disable set InPlacePodLevelResourcesVerticalScaling to false if needed	2026-03-28 08:42:11 +05:30
Kubernetes Prow Robot	473b7635de	Merge pull request #138006 from tallclair/push-kooxxktxovkr Flaky test fix for 'should restart failing container when pod restartPolicy is Always'	2026-03-25 02:18:16 +05:30
Xinyun Liu	990b72c522	Address comments and add more e2e tests	2026-03-24 17:52:45 +00:00

1 2 3 4 5 ...

3529 commits