The test starts the kubelet with a non-default setting for idsPerPod,
runs a pod, deletes it, and then restarts the kubelet.
The issue is that the kubelet guarantees that no two pods userns
mappings overlap (for security reasons). But we are not waiting for the
pod to be removed, the deleteSync() call only waits for the API server
to remove the pod.
So, the pod is on disk (and maybe even running!) when we restart the
kubelet. As the previous configuration is incompatible with the new one
after restart if pods are running, the kubelet failing is the right
thing. We should just wait for the pod to be deleted from the kubelet
too, before restarting it with an incompatible configuration.
So, this commit just changes the pod deleteion (before done in
e2eoutput.TestContainerOutput() just waiting for the API server) to wait
for the kubelet to delete the pod.
Signed-off-by: Rodrigo Campos <rodrigo@amutable.com>
* Adds polling for HPA reconciliation_duration unit test
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
* using struct name
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
---------
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
As it can be seen in build/pause/CHANGELOG.md the PATCH
level version for pause was introduced due to requirements
from the pause image for Windows. This however invalidated
our build/depedencies.yaml validation as it only accounted for
the MAJOR.MINOR version of pause (e.g. 3.10, not 3.10.1).
Enforce full SemVer validation for the pause image dependents.
The latest pause version is 3.10.2 but due to the introduction
of the PATCH level version to the pause image (previously was
only MAJOR.MINOR), various files have remained on an older
version. Either 3.10 or 3.10.1. Our validation with
build/dependencies.yaml ./hack/verify-external-dependencies.sh
did not account for that.
The Memory Manager Metrics BeforeEach asserts that zero pods are
running on the node after a kubelet config update. This hard assertion
flakes when a preceding serial test's namespace deletion hasn't
completed yet — framework namespace cleanup is async and the kubelet
restart in updateKubeletConfig can delay in-flight pod termination.
CI logs show leftover pods from MemoryQoS tests (memqos-burstable,
memqos-no-limit, etc.), Probe Stress tests (50-container pods), and
Summary API PSI tests (memory-pressure-pod), all still Running when
the assertion fires 4-7ms after the previous test finishes.
Replace the immediate Expect(count).To(BeZero()) with an Eventually
poll (2 minute timeout, 5 second interval) that gives pods time to
drain after the kubelet restart. The existing printAllPodsOnNode
diagnostic output is preserved inside the poll for debugging.
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
This commit introduces the DRAResourceClaimGranularStatusAuthorization
feature gate (Beta in 1.36) to enforce fine-grained authorization checks
on ResourceClaim status updates.
Previously, 'update' permission on 'resourceclaims/status' allowed modifying
the entire status. To enforce the principle of least privilege for DRA
drivers and the scheduler, this change introduces synthetic subresources and
verb prefixes:
- 'resourceclaims/binding': Required to update 'status.allocation' and
'status.reservedFor'.
- 'resourceclaims/driver': Required to update 'status.devices'. Evaluated
on a per-driver basis using 'associated-node:<verb>' (for node-local
ServiceAccounts) or 'arbitrary-node:<verb>' (for cluster-wide controllers).
The kubelet status manager was not preserving the
pod.status.nodeAllocatableResourceClaimStatuses field set by the
scheduler during pod status merges. This caused the information to the
to be destroyed by the kubelet's next status sync, making the field
always appear empty.
Add the same preservation pattern already used for
ResourceClaimStatuses and ExtendedResourceClaimStatus to both
mergePodStatus() and isPodStatusByKubeletEqual().
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>