Drops f.WithFlaky() from two test blocks where the tag has become stale:
- [sig-node] kubelet host cleanup with volume mounts [HostCleanup]
(covers both NFS sub-tests: active and sleeping client pods)
- [sig-storage] PersistentVolumes-local "should set different fsGroup
for second pod if first pod is deleted" (covers all 8 volume-type
variants from the parameterized parent)
Testgrid evidence -- both dashboards show consistent passes across all
30 recent runs:
https://testgrid.k8s.io/google-gce#gci-gce-flaky&include-filter-by-regex=Flakyhttps://testgrid.k8s.io/sig-testing-misc#gce-cos-master-flaky-repro&include-filter-by-regex=Flaky
History:
- HostCleanup was tagged [Flaky] in PR 41659 (merged 2017-04-13) as a
quick workaround for parallel-execution interference with disruptive
tests; the follow-up "remove [Flaky]" PR mentioned in that body never
landed. Root-cause issue 31272 ("Hung volumes can wedge the kubelet")
remains open.
- fsGroup test was tagged [Flaky] in PR 75015 (merged 2019-03-06) to
skip a race in DesiredStateOfWorld re-adding terminating-pod volumes.
Root-cause issue 73168 ("Do not remount volume again after it is
detached") remains open. The obsolete TODO comment referencing that
issue is also removed.
If either test regresses, the safe rollback is to restore f.WithFlaky()
and reopen the conversation on issue 31272 / 73168.
Make the process in the container more cpu intensive to make sure
we catch CPU usage more than nanocore, within the test window to overcome
a known limitation in older containerd versions.
Increase UsageNanoCores and UsageCoreNanoSeconds boundaries, to cater for
the additional cpu loads.
The GCE node image family was updated to cos-125-lts but the
nvidia-driver-installer DaemonSet image was never bumped to match.
cos-gpu-installer:v2.5.7 is only suitable for COS M121; it crashes
(CrashLoopBackOff) on cos-125-19216-220-150 nodes, blocking GPU driver
installation and causing all GPU e2e tests to time out.
Bump to v2.5.8, the first release in the COS M125 release notes:
https://cloud.google.com/container-optimized-os/docs/release-notes/m125
75448c416b added feature gate dependencies at the end of a test
name. However, if those tags were already part of the previous text, either
because they were explicitly added in the current node or in some parent node,
then redundant tags were added.
Now this special case is detected and such redundant tags do not get added
again.
This shouldn't substantially change which tests run in jobs (an on-by-default
beta feature can only depend on other on-by-default features, for example), but
it makes the FeatureGate list in the test name more complete.
The additional feature gate names are treated like additional meta data and get
added at the end of the full test name.
The format package is used by ktesting, both to reconfigure Gomega and to
format errors, therefore it has to be moved to staging together with ktesting,
if or when we get to that because those are desirable features.
Because format only has the YAML package as additional dependency and that
should be okay for all other repos (except for the YAML package itself, of
course), we can publish the format package as a sub-package of such a future
ktesting module.
Avoiding the dependency on apimachinery to detect unstructured.Unstructured is
a bit tricky, but doable by relaxing what we check for. The test/utils/format
package is kept to test ktesting/format with the actual packages that it cannot
depend on (apimachinery, api).
The client-go variant of ktesting is a superset of the normal
ktesting, which makes it possible to get the full original
functionality simply by changing the import path.
This enables passing the client-go clients and helpers via TContext.WithValue.
The advantage of this approach is that the implementation is small. The
downside is that all call sites need to be updated and need two imports. It's
also not discoverable from the TContext type that it may provide clients.
Only a few callers get updated to demonstrate the usage.
Document why cuda-samples is pinned to v12.5 rather than the latest
tag: it has to match the CUDA 12.5 toolkit in the base image and the
cuda-demo-suite-12-5 apt package used on x86_64. v13+ cuda-samples
also requires CUDA Toolkit 13.x and switched from make to CMake, so
bumping is a coordinated change across base image, apt package, git
tag, and build commands.
Signed-off-by: Davanum Srinivas <davanum@gmail.com>