Commit graph

496 commits

Author SHA1 Message Date
Kubernetes Prow Robot
0e71d2d28f
Merge pull request #137749 from dims/dsrinivas/issue-135713-pod-status-exit-2
test/e2e/node: tolerate exit code 2 in pod status flake
2026-03-21 23:28:24 +05:30
Davanum Srinivas
d4181e8c20
test/e2e/node: tolerate exit code 2 in pod status flake
The fast-delete pod status tests currently require the intentionally failing
"fail" container to report exit code 1. In CI, some runtimes occasionally
report exit code 2 with reason=Error even though the tested invariant still
holds: the container failed and the blocked workload container never started.

The latest dims/test-k8s failure on master showed exactly that state: the pod
remained Failed, Initialized=False, the blocked container reported
started=false, and only the failing init container drifted from exit 1 to exit
2. This matches kubernetes/kubernetes issue 135713 and the related
pending-container history in PR 131605.

Accept exit code 2 in this verifier so the test continues to assert the
behavior it is meant to cover instead of a lower-layer exit-code detail.

Fixes issue 135713

Tested:
- hack/verify-gofmt.sh
- hack/verify-test-code.sh
- hack/verify-typecheck.sh ./test/e2e/node/...
- go test ./test/e2e/node -run TestNonExistent -count=1

Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
2026-03-21 15:30:46 +01:00
Kubernetes Prow Robot
7a3a6cf4be
Merge pull request #136725 from pravk03/native-dra-2
Introduce support of DRA for Native Resources
2026-03-19 03:36:38 +05:30
Praveen Krishna
cdcfc4eeb3 Add integration tests. 2026-03-18 19:20:10 +00:00
Kubernetes Prow Robot
27b42dd16d
Merge pull request #137453 from rawsocket/master
kubelet: add terminated_containers_total metric
2026-03-18 23:20:49 +05:30
Adel Abouchaev
1a49c37b77 kubelet: add terminated_containers_total metric
Add a new ALPHA stability metric terminated_containers_total to track
  container terminations (both successful and failed). This metric provides
  aggregate visibility into container exit patterns across the node,
  supporting detection of abnormal exits (e.g., SIGSEGV, OOMKilled) and
  enabling error-rate calculations.

  To ensure node stability and comply with Kubernetes instrumentation
  standards, the metric uses the following low-cardinality labels:
   - container_type (container, init_container, or ephemeral_container)
   - exit_code (the literal exit status)
   - reason (the termination reason from the runtime)

  High-cardinality labels (container_name, namespace_name) are deliberately
  omitted to prevent metric cardinality explosion. Problematic containers can
  be identified via standard troubleshooting workflows using Kubernetes
  Events or API status.

  Included:
   - Metric definition and registration in metrics.go.
   - Status manager implementation to record transitions exactly once.
   - Unit tests in status_manager_test.go verifying success/failure logic.
   - Node e2e test to verify correct metrics exposure.
2026-03-18 02:22:29 +00:00
Kubernetes Prow Robot
e1be691e7f
Merge pull request #136043 from natasha41575/os_feasibility
[InPlacePodVerticalScaling] create an admission plugin to perform the OS and node capacity checks
2026-03-18 03:23:39 +05:30
Natasha Sarkar
fd8c6d3e2e add pod resize feasibility check admission plugin 2026-03-17 17:12:31 +00:00
Kubernetes Prow Robot
9c7e57bb7c
Merge pull request #137330 from tico88612/cleanup/test-node-pod-dep-prometheus
Remove dep. Prometheus from test/e2e/node/pods.go
2026-03-16 20:43:49 +05:30
Sergey Kanzhelev
9aee7c917a wait for container condition to be true before sending the pod update 2026-03-13 23:21:22 +00:00
ChengHao Yang
195b9f598d
Remove dep. Prometheus from test/e2e/node/pods.go
Add the MetricFamilyToText in `component-base/metric/testutil`

Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
2026-03-11 19:14:35 +08:00
Yuan Wang
f33a2767aa Refactor container restart policy tests to e2e/common/node
- Added validation for lastTerminationStatus
2026-03-09 23:05:05 +00:00
Mads Jensen
1f2b70a043 Lint: Use modernize/rangeint in test/{e2e,e2e_node,images,soak} 2026-03-07 10:17:31 +01:00
Yuan Wang
906134cee9 Update pod after the container is removed
Ensures the single-container pod can restart quickly
2026-03-05 23:21:33 +00:00
Kubernetes Prow Robot
dd0958fece
Merge pull request #136851 from jiefeng-xu/jiefeng/fix-gpu-flake-136378
test/e2e/node: reduce flakiness in GPU nvidia-smi test
2026-03-04 08:56:17 +05:30
Kevin Hannon
b26954bc0f merging the pod rejection
generation test into pod_admission.go and commenting out
PodReadyToStartContainers. Conformance promotion will follow in a
separate PR once this lands green, per review feedback.
2026-03-03 13:58:04 -05:00
Jiefeng Xu
b738ae6d97 test/e2e/node: handle quick pod completion in GPU startup wait 2026-03-01 11:50:57 -08:00
Chandan Maurya
e54eef10d1 Use localhost image reference in PodObservedGenerationTracking test
The test uses an invalid image to induce a pull error. The previous image
name 'some-image-that-doesnt-exist' causes slow DNS/registry resolution
on some environments (especially metal), leading to 30s timeouts.

Using 'localhost/some-image-that-does-not-exist' makes the pull fail
instantly since there is no registry on localhost, avoiding flaky
timeouts.
2026-02-26 10:04:00 +05:30
Kubernetes Prow Robot
9dc55d7d9e
Merge pull request #135729 from yangjunmyfm192085/fixe2e2
test/e2e: e2e test cases `should support seccomp default, which is unconfined [LinuxOnly]`. Execution failed.
2026-02-11 09:26:08 +05:30
杨军10092085
d94808665c e2e test cases should support seccomp default, which is unconfined [LinuxOnly]. Execution failed. 2026-02-11 08:17:31 +08:00
Jiefeng Xu
6e203664eb test/e2e/node: reduce flakiness in GPU nvidia-smi test 2026-02-08 22:40:45 -08:00
Mads Jensen
757647786d Remove redundant re-assignments in for-loops in test/{e2e,integration,utils}
The modernize forvar rule was applied. There are more details in this blog
post: https://go.dev/blog/loopvar-preview
2026-01-25 22:58:27 +01:00
Sotiris Salloumis
d9c3ec29ad Move getNodeAllocatableAndAvailableValues to framework
To allow use of this good method from future tests using
e2enode test framework.
2026-01-21 19:41:08 +01:00
Patrick Ohly
47d02070ba E2E: remove unnecessary trailing spaces in test names
The spaces are unnecessary because Ginkgo adds spaces automatically.

This was detected before only for tests using the wrapper functions,
now it also gets detected for ginkgo methods.
2026-01-07 12:05:43 +01:00
ndixita
10b73f8ef9
Test fixes
Signed-off-by: ndixita <ndixita@google.com>
2025-11-12 06:21:06 +00:00
ndixita
1733d8fc8c
e2e tests
Signed-off-by: ndixita <ndixita@google.com>
2025-11-11 18:19:09 +00:00
ndixita
efc3126b76
Adding Resources and AllocatedResoures fields to the list of expected fields in PodStatus in admission test 2025-11-11 18:15:20 +00:00
Yuan Wang
0b47a37861 Keep pod in running state and prune past container status from runtime 2025-11-11 06:37:49 +00:00
Yuan Wang
aac951d902 Add dependency for NodeDeclaredFeatures 2025-11-10 09:41:02 +00:00
Yuan Wang
2eb1eeeabf add disruptive tests 2025-11-10 09:41:02 +00:00
Yuan Wang
83c5cd5526 Implement restartPod action 2025-11-10 09:41:02 +00:00
Lubomir I. Ivanov
396a7c1a12 test/e2e/node: add minimum kubelet version to some pod tests
A couple of tests were recently promoted to conformance
but they did not include a minimimum kubelet version,
which broke the kubeadm/kinder e2e jobs that skew the kubelet
version against the apiserver version.
2025-11-05 12:06:47 +02:00
Natasha Sarkar
2a217a9bfd promote pod generation tests to conformance 2025-10-29 20:57:59 +00:00
Natasha Sarkar
21c832b47d promote pod generation to GA 2025-10-29 15:52:17 +00:00
Kubernetes Prow Robot
c7f910ed1f
Merge pull request #133762 from natasha41575/expandQuotaTests
[InPlacePodVerticalScaling] Expand coverage for resourceQuota and limitRanger e2e tests
2025-10-02 00:10:56 -07:00
Michael Aspinwall
84f85712be feat: Add matcher and conformance tests ensuring that RV is uint128 2025-10-01 00:01:50 +00:00
Michael Aspinwall
37fcfcd29e feat: Add conformance tests for all resources for comparable resource version 2025-09-29 23:32:07 +00:00
Natasha Sarkar
89b75e998d expand coverage for resource quota and limit ranger tests 2025-09-19 15:44:42 +00:00
Mauricio Poppe
55700685bd
Revert "Add retries to node's crictl test" 2025-09-08 20:35:31 -04:00
Sascha Grunert
c8f8f66e6d
Increase termination timeout for evicted pods should be terminal test
This doubles the termination timeout for the eviction test from 5min to
10min. Reason for that is that the eviction manager relies on pod stats
metrics, which may not be acceessible during a period of time because of
the kubelet API unreachable. This could be reasoned in hardware or
network pressure when multiple tests run in parallel.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2025-09-03 08:58:46 +02:00
Natasha Sarkar
f1d980adf9 separate resource-quota and limit-ranger resize tests 2025-08-28 15:56:10 +00:00
Mauricio Poppe
e1dd085ffe Add retries to node's crictl test.
In addition, simplify the test by only considering the output of
`crictl version` which already displays info about the container
runtime.
2025-08-12 16:31:39 +00:00
yliao
23d6f73e72 extended resource backed by DRA: test 2025-07-29 18:55:28 +00:00
Yuan Wang
4b479da4b5 Remove the feature from e2e test 2025-07-28 16:33:20 +00:00
Yuan Wang
b34f8782e2 Add e2e tests 2025-07-24 16:49:54 +00:00
Kubernetes Prow Robot
4676341457
Merge pull request #133065 from natasha41575/dedupe-resize-test
dedupe fetching allocatable and available resources in node test
2025-07-22 17:56:27 -07:00
ylink-lfs
fb4e252224 test: add batch pod deletion for kubelet e2e tests 2025-07-19 14:13:59 +08:00
Natasha Sarkar
13a6d2121c check a couple extra failure scenarios 2025-07-18 23:30:54 +00:00
Natasha Sarkar
5d31866313 dedupe fetching allocatable and available resources in node test 2025-07-18 17:50:07 +00:00
Natasha Sarkar
f456a70bde use CreateBatch and MakeResizePatch 2025-07-17 19:19:05 +00:00