* wire now (time) to the availability checks in the StatefulSet controller
- this helps to make the controller reconcilliation consistent
* schedule pod availability checks at the correct time in StatefulSets
* replace "k8s.io/klog/v2/ktesting" with "k8s.io/kubernetes/test/utils/ktesting"
for advanced features (e.g. Eventually)
* add StatefulSetAvailabilityCheck test
Previously, when a Pod residing in the 'unschedulablePods' queue was updated and subsequently rejected by PreEnqueue plugins (returning 'Wait'), the logic in 'moveToActiveQ' would return early because the Pod was already present in the queue.
This caused the 'scheduler_gated_pods_total' metric to fail to increment, leading to metric inconsistencies (and potentially negative values upon Pod deletion).
This change adds a check to detect the transition from Ungated to Gated. If detected, the Pod is removed and re-added to the queue to ensure metrics are correctly swapped (Unschedulable-- and Gated++).
Added regression test 'TestSchedulingQueueMetrics_UngatedToGated' to verify the fix.
Signed-off-by: Vlad Shkrabkov <vshkrabkov@google.com>
The ipallocator was blindly assuming that all errors are retryable, that
causes that the allocator tries to exhaust all the possibilities to
allocate an IP address.
If the error is not retryable this means the allocator will generate as
many API calls as existing available IPs are in the allocator, causing
CPU exhaustion since this requests are coming from inside the apiserver.
In addition to handle the error correctly, this patch also interpret the
error to return the right status code depending on the error type.
Co-authored-by: carlory <baofa.fan@daocloud.io>
Previously, we created a separate filter for each stale flow,
resulting in O(n^2) complexity when deleting flows because the
netlink llibrary iterates over all filters for each flow.
This change introduces a new filter backed by a `sets.Set` for O(1) lookup per flow.
This reduces the overall complexity of cleaning up stale entries to O(n).
Two feature gate constants were missing the explicit `featuregate.Feature`
type annotation, making them inconsistent with the rest of the file:
- ChangeContainerStatusOnKubeletRestart
- StatefulSetSemanticRevisionComparison
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Currently, we set TLSConfig.Config.GetCertificate, but then also pass
certificate and key paths to http.Server.ListenAndServeTLS.
ListenAndServeTLS uses these paths to populate the TLS config Certificate
property. Then, when accepting connections, a non-nil Certificate is preferred
over GetCertificate if the ServerName is not set in ClientHelloInfo. Finally,
the Go TLS client doesn't set ServerName when connecting by IP. As a result,
when connecting to the kubelet by IP (e.g. to fetch pod logs), stale
certificates are served.
This patch passes empty certFile and keyFile arguments, to force the TLS
server to use the GetCertificate function.
This is done by clearing key/cert file config when setting GetCertificate as
suggested in PR review. This way, all downstream users of kubeDeps.TLSConfig
will do the right thing automatically.