Runtime config can be set via the kind config, which is simpler than setting
the apiserver parameter.
DynamicResourceAllocation is enabled by default nowadays, but still needs to be
set for the current n-3 skew testing which picks 1.33 (1.37 still in alpha).
Similar for NodeLogQuery (GA in 1.36).
The kubeadm v1beta4 ClusterConfiguration changed ExtraArgs from
map[string]string to []Arg (list of {name, value} pairs). The
scheduler and controllerManager sections already used the new list
format, but apiServer.extraArgs still used the old map format, causing
the --runtime-config flag to be silently dropped when kind uses v1beta4.
Without runtime-config, resource.k8s.io/v1beta1, v1beta2 and v1alpha3
default to disabled, so the API server skips them with "has no
resources" and tests using those API versions get 404.
Previously, the webhook transport was switched from HTTP/2 to HTTP/1.1 to work around HTTP/2's single-connection multiplexing, which prevented concurrent requests from load-balancing across multiple backend pods. However, under HTTP/1.1, connections are kept alive and cached as idle in the transport's pool.
Because Go's http.Transport keys its connection cache by the request's URL Host (in this case the service name) and we overrode the DialContext to perform dynamic endpoint resolution, when a new request is sent, if there is an idle connection in the pool matching the service hostname, the connection is reused and the dialer is skipped.
Introduce test/e2e_node_windows/ as a self-contained Windows node e2e
test suite, fully isolated from the Linux-focused test/e2e_node/ package.
All new files are gated with //go:build windows, and the tree includes
a scoped OWNERS file so it lands under an agreed governance model.
In hack/lib/golang.sh, skip building test/e2e_node/e2e_node.test when
KUBE_BUILD_PLATFORMS targets Windows. Windows has a separate e2e_node
test binary which does not currently need to be bundled in an archive.
Document the Windows feature label in test/e2e/feature/feature.go.
Added a test verifying that when both a device plugin and a DRA
driver advertise the same resource on one node, the device plugin
wins (filterExtendedResources takes the DRA path only when
allocatable == 0).
Asserted ExtendedResourceClaimStatus in the existing "process
extended resources after device plugin uninstall" test to confirm
the DRA path is taken after DP removal.
These tests have race conditions where they assume immediate state
visibility after a pod transitions to Running. The current code works
on fast runtimes but is fundamentally racy: kubelet log streaming,
log file flushing, and container status updates are eventually
consistent, not synchronous.
Switching to gomega.Eventually polling makes the tests deterministic.
The success path on fast runtimes is unchanged (polling succeeds on
first attempt), but the tests now correctly handle scenarios where
state takes a moment to propagate. This benefits any environment
where containers may take longer to start (VM-isolated runtimes such
as Kata, gVisor, and Windows Hyper-V; overloaded CI VMs; shared
multi-tenant clusters).
- ephemeral_containers.go (both 'should be added' and 'should update'
tests): the 'polo' log-content check is polled via gomega.Eventually
with f.Timeouts.PodStartShort. The container may report Running
before its first stdout has been flushed.
- lifecycle_hook.go ('ignore terminated container'): use
f.Timeouts.PodDelete instead of gracePeriod*time.Second for the
termination wait. The actual correctness check (container's intrinsic
StartedAt/FinishedAt < sleepSeconds) is unchanged and unaffected by
how long we waited.
- pods.go ('retrieving logs from the container over websockets'):
poll the websocket open and read via gomega.Eventually. The container
can be reported Running before its first stdout line has been flushed,
so opening the websocket immediately may return an empty or partial
buffer.
Drops f.WithFlaky() from two test blocks where the tag has become stale:
- [sig-node] kubelet host cleanup with volume mounts [HostCleanup]
(covers both NFS sub-tests: active and sleeping client pods)
- [sig-storage] PersistentVolumes-local "should set different fsGroup
for second pod if first pod is deleted" (covers all 8 volume-type
variants from the parameterized parent)
Testgrid evidence -- both dashboards show consistent passes across all
30 recent runs:
https://testgrid.k8s.io/google-gce#gci-gce-flaky&include-filter-by-regex=Flakyhttps://testgrid.k8s.io/sig-testing-misc#gce-cos-master-flaky-repro&include-filter-by-regex=Flaky
History:
- HostCleanup was tagged [Flaky] in PR 41659 (merged 2017-04-13) as a
quick workaround for parallel-execution interference with disruptive
tests; the follow-up "remove [Flaky]" PR mentioned in that body never
landed. Root-cause issue 31272 ("Hung volumes can wedge the kubelet")
remains open.
- fsGroup test was tagged [Flaky] in PR 75015 (merged 2019-03-06) to
skip a race in DesiredStateOfWorld re-adding terminating-pod volumes.
Root-cause issue 73168 ("Do not remount volume again after it is
detached") remains open. The obsolete TODO comment referencing that
issue is also removed.
If either test regresses, the safe rollback is to restore f.WithFlaky()
and reopen the conversation on issue 31272 / 73168.