handleSchedulingFailure can refresh podInfo from the informer before AddUnschedulableIfNotPresent. A delete and recreate with the same name may change the Pod UID while inFlightPods still tracks the UID from Pop, so Done and queueing-hint lookups must use that in-flight UID.
Add an explicit in-flight UID parameter, thread it through queueing-hint lookups, cover the same-name recreation case with a regression test, and check the returned error in updated test call sites.
On platforms with many OS-visible NUMA nodes that carry no devices
(e.g. NVIDIA GB200 with 36 NUMA nodes, only 1–2 hosting GPUs),
IterateBitMasks enumerates O(2^n) combinations and stalls the
kubelet for minutes.
Introduce deviceNUMANodes(), which collects the NUMA node IDs from
all registered devices for a resource regardless of allocation state.
generateDeviceTopologyHints() now iterates only over those nodes,
reducing n from 34 to 1–2 on affected hardware.
This fix uses allDevices ensures minAffinitySize and Preferred flags
are computed identically for behavior-preserving, making safe for
backport.
deviceNUMANodes() has a explicit runtime subset guard to guarantee to
return a subset of cadvisor-reported NUMA topology, regardless what
device-plugins report.
Kubernetes-bug: https://github.com/kubernetes/kubernetes/issues/135541
Signed-off-by: Fan Zhang <fanzhang@nvidia.com>
go run k8s.io/publishing-bot/cmd/update-rules@latest -branch=release-1.36 --rules=./staging/publishing/rules.yaml -o ./staging/publishing/rules.yaml
then manually re-add the comments (TODO: make this tool preserve comments)
* Adds polling for HPA reconciliation_duration unit test
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
* using struct name
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
---------
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
As it can be seen in build/pause/CHANGELOG.md the PATCH
level version for pause was introduced due to requirements
from the pause image for Windows. This however invalidated
our build/depedencies.yaml validation as it only accounted for
the MAJOR.MINOR version of pause (e.g. 3.10, not 3.10.1).
Enforce full SemVer validation for the pause image dependents.
The latest pause version is 3.10.2 but due to the introduction
of the PATCH level version to the pause image (previously was
only MAJOR.MINOR), various files have remained on an older
version. Either 3.10 or 3.10.1. Our validation with
build/dependencies.yaml ./hack/verify-external-dependencies.sh
did not account for that.