kubernetes

mirror of https://github.com/kubernetes/kubernetes.git synced 2026-05-28 04:04:39 -04:00

Author	SHA1	Message	Date
Jefftree	7fe9bbb5c5	e2e: skip HostCleanup test when worker has no NodeExternalIP	2026-05-15 15:45:52 -04:00
Davanum Srinivas	1f59ea104a	Remove [Flaky] for green tests Drops f.WithFlaky() from two test blocks where the tag has become stale: - [sig-node] kubelet host cleanup with volume mounts [HostCleanup] (covers both NFS sub-tests: active and sleeping client pods) - [sig-storage] PersistentVolumes-local "should set different fsGroup for second pod if first pod is deleted" (covers all 8 volume-type variants from the parameterized parent) Testgrid evidence -- both dashboards show consistent passes across all 30 recent runs: https://testgrid.k8s.io/google-gce#gci-gce-flaky&include-filter-by-regex=Flaky https://testgrid.k8s.io/sig-testing-misc#gce-cos-master-flaky-repro&include-filter-by-regex=Flaky History: - HostCleanup was tagged [Flaky] in PR 41659 (merged 2017-04-13) as a quick workaround for parallel-execution interference with disruptive tests; the follow-up "remove [Flaky]" PR mentioned in that body never landed. Root-cause issue 31272 ("Hung volumes can wedge the kubelet") remains open. - fsGroup test was tagged [Flaky] in PR 75015 (merged 2019-03-06) to skip a race in DesiredStateOfWorld re-adding terminating-pod volumes. Root-cause issue 73168 ("Do not remount volume again after it is detached") remains open. The obsolete TODO comment referencing that issue is also removed. If either test regresses, the safe rollback is to restore f.WithFlaky() and reopen the conversation on issue 31272 / 73168.	2026-05-11 08:26:29 -04:00
zak905	04286814e7	clean up: remove loop variable capture	2026-04-28 23:53:27 +02:00
Kubernetes Prow Robot	ff06de939d	Merge pull request #134950 from Karthik-K-N/fix-inplace-flake [Flaking test] [InPlacePodVerticalScaling] Fix Pod Resize deferred tests	2026-04-25 11:12:46 +05:30
Davanum Srinivas	0934916b90	test/e2e/node: explain v12.5 pin for cuda-samples on arm64 Document why cuda-samples is pinned to v12.5 rather than the latest tag: it has to match the CUDA 12.5 toolkit in the base image and the cuda-demo-suite-12-5 apt package used on x86_64. v13+ cuda-samples also requires CUDA Toolkit 13.x and switched from make to CMake, so bumping is a coordinated change across base image, apt package, git tag, and build commands. Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2026-04-20 07:07:50 -04:00
Davanum Srinivas	6db917c42e	Update test/e2e/node/gpu.go Co-authored-by: Ed Bartosh <eduard.bartosh@intel.com>	2026-04-20 07:00:54 -04:00
Davanum Srinivas	ad41961d32	test/e2e/node: make GPU sanity test work on arm64 (sbsa) The [Feature:GPUDevicePlugin] Sanity test embeds `apt-get install -y cuda-demo-suite-12-5` under `set -e`. NVIDIA's CUDA apt repo publishes cuda-demo-suite-* for x86_64 but NOT for sbsa (confirmed against the public Packages index on developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/{sbsa,x86_64}/). On arm64 the install fails, the container exits 1, pod.Status.Phase becomes Failed, and the subsequent `gomega.Expect(... .Equal(Succeeded))` assertion trips. Split the demo phase on architecture. On x86_64 keep the existing apt path unchanged. On anything else, build deviceQuery / vectorAdd / bandwidthTest from the public NVIDIA/cuda-samples repo instead. busGrind is exclusive to cuda-demo-suite (no source equivalent in cuda-samples) and is skipped on non-x86_64. The pattern is the one already in production use by sigs.k8s.io/dra-driver-nvidia-gpu in tests/bats/specs/gpu-cuda-demo-suite.yaml, which has been green on Lambda gpu_1x_gh200. Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2026-04-19 18:55:08 -04:00
Kubernetes Prow Robot	0e71d2d28f	Merge pull request #137749 from dims/dsrinivas/issue-135713-pod-status-exit-2 test/e2e/node: tolerate exit code 2 in pod status flake	2026-03-21 23:28:24 +05:30
Davanum Srinivas	d4181e8c20	test/e2e/node: tolerate exit code 2 in pod status flake The fast-delete pod status tests currently require the intentionally failing "fail" container to report exit code 1. In CI, some runtimes occasionally report exit code 2 with reason=Error even though the tested invariant still holds: the container failed and the blocked workload container never started. The latest dims/test-k8s failure on master showed exactly that state: the pod remained Failed, Initialized=False, the blocked container reported started=false, and only the failing init container drifted from exit 1 to exit 2. This matches kubernetes/kubernetes issue 135713 and the related pending-container history in PR 131605. Accept exit code 2 in this verifier so the test continues to assert the behavior it is meant to cover instead of a lower-layer exit-code detail. Fixes issue 135713 Tested: - hack/verify-gofmt.sh - hack/verify-test-code.sh - hack/verify-typecheck.sh ./test/e2e/node/... - go test ./test/e2e/node -run TestNonExistent -count=1 Co-authored-by: Jordan Liggitt <jordan@liggitt.net>	2026-03-21 15:30:46 +01:00
Kubernetes Prow Robot	7a3a6cf4be	Merge pull request #136725 from pravk03/native-dra-2 Introduce support of DRA for Native Resources	2026-03-19 03:36:38 +05:30
Praveen Krishna	cdcfc4eeb3	Add integration tests.	2026-03-18 19:20:10 +00:00
Kubernetes Prow Robot	27b42dd16d	Merge pull request #137453 from rawsocket/master kubelet: add terminated_containers_total metric	2026-03-18 23:20:49 +05:30
Adel Abouchaev	1a49c37b77	kubelet: add terminated_containers_total metric Add a new ALPHA stability metric terminated_containers_total to track container terminations (both successful and failed). This metric provides aggregate visibility into container exit patterns across the node, supporting detection of abnormal exits (e.g., SIGSEGV, OOMKilled) and enabling error-rate calculations. To ensure node stability and comply with Kubernetes instrumentation standards, the metric uses the following low-cardinality labels: - container_type (container, init_container, or ephemeral_container) - exit_code (the literal exit status) - reason (the termination reason from the runtime) High-cardinality labels (container_name, namespace_name) are deliberately omitted to prevent metric cardinality explosion. Problematic containers can be identified via standard troubleshooting workflows using Kubernetes Events or API status. Included: - Metric definition and registration in metrics.go. - Status manager implementation to record transitions exactly once. - Unit tests in status_manager_test.go verifying success/failure logic. - Node e2e test to verify correct metrics exposure.	2026-03-18 02:22:29 +00:00
Kubernetes Prow Robot	e1be691e7f	Merge pull request #136043 from natasha41575/os_feasibility [InPlacePodVerticalScaling] create an admission plugin to perform the OS and node capacity checks	2026-03-18 03:23:39 +05:30
Natasha Sarkar	fd8c6d3e2e	add pod resize feasibility check admission plugin	2026-03-17 17:12:31 +00:00
Kubernetes Prow Robot	9c7e57bb7c	Merge pull request #137330 from tico88612/cleanup/test-node-pod-dep-prometheus Remove dep. Prometheus from test/e2e/node/pods.go	2026-03-16 20:43:49 +05:30
Sergey Kanzhelev	9aee7c917a	wait for container condition to be true before sending the pod update	2026-03-13 23:21:22 +00:00
ChengHao Yang	195b9f598d	Remove dep. Prometheus from test/e2e/node/pods.go Add the MetricFamilyToText in `component-base/metric/testutil` Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>	2026-03-11 19:14:35 +08:00
Yuan Wang	f33a2767aa	Refactor container restart policy tests to e2e/common/node - Added validation for lastTerminationStatus	2026-03-09 23:05:05 +00:00
Mads Jensen	1f2b70a043	Lint: Use modernize/rangeint in test/{e2e,e2e_node,images,soak}	2026-03-07 10:17:31 +01:00
Yuan Wang	906134cee9	Update pod after the container is removed Ensures the single-container pod can restart quickly	2026-03-05 23:21:33 +00:00
Kubernetes Prow Robot	dd0958fece	Merge pull request #136851 from jiefeng-xu/jiefeng/fix-gpu-flake-136378 test/e2e/node: reduce flakiness in GPU nvidia-smi test	2026-03-04 08:56:17 +05:30
Kevin Hannon	b26954bc0f	merging the pod rejection generation test into pod_admission.go and commenting out PodReadyToStartContainers. Conformance promotion will follow in a separate PR once this lands green, per review feedback.	2026-03-03 13:58:04 -05:00
Karthik Bhat	1f9a751ec1	Address review comments by using 2 pods instead of 3 pods and simlify the logic.	2026-03-03 11:46:54 +05:30
Jiefeng Xu	b738ae6d97	test/e2e/node: handle quick pod completion in GPU startup wait	2026-03-01 11:50:57 -08:00
Chandan Maurya	e54eef10d1	Use localhost image reference in PodObservedGenerationTracking test The test uses an invalid image to induce a pull error. The previous image name 'some-image-that-doesnt-exist' causes slow DNS/registry resolution on some environments (especially metal), leading to 30s timeouts. Using 'localhost/some-image-that-does-not-exist' makes the pull fail instantly since there is no registry on localhost, avoiding flaky timeouts.	2026-02-26 10:04:00 +05:30
Kubernetes Prow Robot	9dc55d7d9e	Merge pull request #135729 from yangjunmyfm192085/fixe2e2 test/e2e: e2e test cases `should support seccomp default, which is unconfined [LinuxOnly]`. Execution failed.	2026-02-11 09:26:08 +05:30
杨军10092085	d94808665c	e2e test cases should support seccomp default, which is unconfined [LinuxOnly]. Execution failed.	2026-02-11 08:17:31 +08:00
Jiefeng Xu	6e203664eb	test/e2e/node: reduce flakiness in GPU nvidia-smi test	2026-02-08 22:40:45 -08:00
Karthik Bhat	a3d241347c	Resize pod to request for more cpu so it will remain in deffered state	2026-01-29 15:11:24 +05:30
Mads Jensen	757647786d	Remove redundant re-assignments in for-loops in test/{e2e,integration,utils} The modernize forvar rule was applied. There are more details in this blog post: https://go.dev/blog/loopvar-preview	2026-01-25 22:58:27 +01:00
Sotiris Salloumis	d9c3ec29ad	Move getNodeAllocatableAndAvailableValues to framework To allow use of this good method from future tests using e2enode test framework.	2026-01-21 19:41:08 +01:00
Patrick Ohly	47d02070ba	E2E: remove unnecessary trailing spaces in test names The spaces are unnecessary because Ginkgo adds spaces automatically. This was detected before only for tests using the wrapper functions, now it also gets detected for ginkgo methods.	2026-01-07 12:05:43 +01:00
ndixita	10b73f8ef9	Test fixes Signed-off-by: ndixita <ndixita@google.com>	2025-11-12 06:21:06 +00:00
ndixita	1733d8fc8c	e2e tests Signed-off-by: ndixita <ndixita@google.com>	2025-11-11 18:19:09 +00:00
ndixita	efc3126b76	Adding Resources and AllocatedResoures fields to the list of expected fields in PodStatus in admission test	2025-11-11 18:15:20 +00:00
Yuan Wang	0b47a37861	Keep pod in running state and prune past container status from runtime	2025-11-11 06:37:49 +00:00
Yuan Wang	aac951d902	Add dependency for NodeDeclaredFeatures	2025-11-10 09:41:02 +00:00
Yuan Wang	2eb1eeeabf	add disruptive tests	2025-11-10 09:41:02 +00:00
Yuan Wang	83c5cd5526	Implement restartPod action	2025-11-10 09:41:02 +00:00
Lubomir I. Ivanov	396a7c1a12	test/e2e/node: add minimum kubelet version to some pod tests A couple of tests were recently promoted to conformance but they did not include a minimimum kubelet version, which broke the kubeadm/kinder e2e jobs that skew the kubelet version against the apiserver version.	2025-11-05 12:06:47 +02:00
Natasha Sarkar	2a217a9bfd	promote pod generation tests to conformance	2025-10-29 20:57:59 +00:00
Natasha Sarkar	21c832b47d	promote pod generation to GA	2025-10-29 15:52:17 +00:00
Kubernetes Prow Robot	c7f910ed1f	Merge pull request #133762 from natasha41575/expandQuotaTests [InPlacePodVerticalScaling] Expand coverage for resourceQuota and limitRanger e2e tests	2025-10-02 00:10:56 -07:00
Michael Aspinwall	84f85712be	feat: Add matcher and conformance tests ensuring that RV is uint128	2025-10-01 00:01:50 +00:00
Michael Aspinwall	37fcfcd29e	feat: Add conformance tests for all resources for comparable resource version	2025-09-29 23:32:07 +00:00
Natasha Sarkar	89b75e998d	expand coverage for resource quota and limit ranger tests	2025-09-19 15:44:42 +00:00
Mauricio Poppe	55700685bd	Revert "Add retries to node's crictl test"	2025-09-08 20:35:31 -04:00
Sascha Grunert	c8f8f66e6d	Increase termination timeout for `evicted pods should be terminal` test This doubles the termination timeout for the eviction test from 5min to 10min. Reason for that is that the eviction manager relies on pod stats metrics, which may not be acceessible during a period of time because of the kubelet API unreachable. This could be reasoned in hardware or network pressure when multiple tests run in parallel. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2025-09-03 08:58:46 +02:00
Natasha Sarkar	f1d980adf9	separate resource-quota and limit-ranger resize tests	2025-08-28 15:56:10 +00:00

1 2 3 4 5 ...

505 commits