The recent change to support importing ktesting into an E2E suite
without progress reporting was flawed:
- If a Go unit test had a deadline (the default when invoked
by `go test`!), the early return skipped initializing progress
reporting.
- When it didn't, for example when invoking a test binary directly
under stress, a test created goroutines which were kept running,
which broke leak checking in e.g. an integration tests TestMain.
The revised approach uses reference counting: as long as some unit test is
running, the progress reporting with the required goroutines are active.
When the last one ends, they get cleaned up, which keeps the goleak
checker happy.
The TestCause tests were already unreliable in the CI. The others failed under
stress.
As synctest we have to be more careful how to construct and clean up the parent
context for TestCause (must happen inside bubble), but once that's handled we
can reliably measure the (fake) time and compare exactly against expected
results.
Manually pairing Being with End is too error prone to be useful. It had the
advantage of keeping variables created between them visible to the following
code, but that doesn't justify using those calls.
By using a callback we can achieve a few things:
- Code using it automatically shadows the parent tCtx, thus enforcing
that within a code block the tCtx with step is used consistently.
- The code block is clearly delineated with curly braces.
- When the code block ends, the unmodified parent tCtx is automatically
in scope again.
Downsides:
- Extra boilerplate for the anonymous function.
Python's `with tCtx.Step(...) as tCtx: ` would be nicer.
As an approximation of that `for tCtx := range tCtx.Step(...)` was
tried with `Step` returning an iterator, but that wasn't very idiomatic.
- Variables created inside the code block are not visible outside of it.
(cherry picked from commit 047682908d)
Refactoring the DRA upgrade/downgrade testing such that it runs as Go test
depended on supporting ktesting in the E2E framework. That change worked during
presubmit testing, but broke some periodic jobs. Therefore the relevant commits
from https://github.com/kubernetes/kubernetes/pull/135664/commits get reverted:
c47ad64820 DRA e2e+integration: test ResourceSlice controller
047682908d ktesting: replace Begin/End with TContext.Step
de47714879 DRA upgrade/downgrade: rewrite as Go unit test
7c7b1e1018 DRA e2e: make driver deployment possible in Go unit tests
65ef31973c DRA upgrade/downgrade: split out individual test steps
47b613eded e2e framework: support creating TContext
The last one is what must have caused the problem, but the other commits depend
on it.
Bumping to 5 is useful in unit tests. Those tend to not produce less output and
ideally use per-test output, so we end up keeping only the output of failed
tests where increased verbosity also in CI runs is useful.
But ktesting now also gets imported into e2e test binaries through the
framework. There the increased verbosity is apparently causing OOM killing in
some jobs which previously worked fine.
Long term we need a better solution than simply disabling the verbosity
change. We could modify each unit test to call SetDefaultVerbosity, but that's
tedious. Perhaps an env variable? It cannot be a command line flag because not
all unit tests accept `-v`.
Gomega formats errors by first showing Error() (already has all information
after WithError) and then again by dumping the error struct, which is redundant
in this case. We can avoid the latter by providing a GomegaString
implementation which returns nothing.
Being able to call arbitrary functions is useful, even if it means giving up
some compile-time checking. Because we now use reflection,
Eventually/Consistently can be methods.
Manually pairing Being with End is too error prone to be useful. It had the
advantage of keeping variables created between them visible to the following
code, but that doesn't justify using those calls.
By using a callback we can achieve a few things:
- Code using it automatically shadows the parent tCtx, thus enforcing
that within a code block the tCtx with step is used consistently.
- The code block is clearly delineated with curly braces.
- When the code block ends, the unmodified parent tCtx is automatically
in scope again.
Downsides:
- Extra boilerplate for the anonymous function.
Python's `with tCtx.Step(...) as tCtx: ` would be nicer.
As an approximation of that `for tCtx := range tCtx.Step(...)` was
tried with `Step` returning an iterator, but that wasn't very idiomatic.
- Variables created inside the code block are not visible outside of it.
Many helper packages need to know the test namespace in addition to the client.
Supporting passing of that information through the TContext avoids adding
another parameter to such helper packages.
This mirrors similar functionality in framework.Framework which also provides
a namespace to the test and packages taking such a parameter.
Instead of granting callers access to the instance stored in the context,
let's return a copy. Otherwise a caller might accidentally modify what is
stored in the context when they forget to make a copy themselves before
modifying fields.
The original implementation was inspired by how context.Context is handled via
wrapping a parent context. That approach had several issues:
- It is useful to let users call methods (e.g. tCtx.ExpectNoError)
instead of ktesting functions with a tCtx parameters, but that only
worked if all implementations of the interface implemented that
set of methods. This made extending those methods cumbersome (see
the commit which added Require+Assert) and could potentially break
implementations of the interface elsewhere, defeating part of the
motivation for having the interface in the first place.
- It was hard to see how the different TContext wrappers cooperated
with each other.
- Layering injection of "ERROR" and "FATAL ERROR" on top of prefixing
with the klog header caused post-processing of a failed unit test to
remove that line because it looked like log output. Other log output
lines where kept because they were not indented.
- In Go <=1.25, the `go vet sprintf` check only works for functions and
methods if they get called directly and themselves directly pass their
parameters on to fmt.Sprint. The check does not work when calling
methods through an interface. Support for that is coming in Go 1.26,
but will depend on bumping the Go version also in go.mod and thus
may not be immediately possible in Kubernetes.
- Interface documentation in
https://pkg.go.dev/k8s.io/kubernetes@v1.34.2/test/utils/ktesting#TContext
is a monolithic text block. Documentation for methods is more readable and allows
referencing those methods with [] (e.g. [TC.Errorf] works, [TContext.Errorf]
didn't).
The revised implementation is a single struct with (almost) no exported
fields. The two exceptions (embedded context.Context and TB) are useful because
it avoids having to write wrappers for several functions resp. necessary
because Helper cannot be wrapped. Like a logr.LogSink, With* methods can make a
shallow copy and then change some fields in the cloned instance.
The former `ktesting.TContext` interface is now a type alias for
`*ktesting.TC`. This ensures that existing code using ktesting doesn't need to
be updated and because that code is a bit more compact (`tCtx
ktesting.TContext` instead of `tCtx *ktesting.TContext` when not using such an
alias). Hiding that it is a pointer might discourage accessing the exported
fields because it looks like an interface.
Output gets fixed and improved such that:
- "FATAL ERROR" and "ERROR" are at the start of the line, followed by the klog header.
- The failure message follows in the next line.
- Continuation lines are always indented.
The set of methods exposed via TB is now a bit more complete (Attr, Chdir).
All former stand-alone With* functions are now also available as methods and
should be used instead of the functions. Those will be removed.
Linting of log calls now works and found some issues.
Assert is useful because sometimes one wants to check several different things
in the same test, without stopping after the first failed assertion.
Require gets added for symmetry and to mirror testify's require and
assert. Require is identical to Expect (gomega naming).
We need to see the actual effect that ktesting will have when used as wrapper
around testing.T. Producing an error hid that adding the klog header makes some
output sub-optimal:
<klog header>: FATAL ERROR: ...
The problem with having the klog header at the beginning of the line is that
the Kubernetes `make test` post-processing treats this as log output instead
of the failure message of the test. Will be fixed separately.
This has been replaced by `//build:...` for a long time now.
Removal of the old build tag was automated with:
for i in $(git grep -l '^// +build' | grep -v -e '^vendor/'); do if ! grep -q '^// Code generated' "$i"; then sed -i -e '/^\/\/ +build/d' "$i"; fi; done
A "sync test" runs the test inside a testing/synctest bubble. Time moves
forward in a deterministic fashion when all goroutines are blocked
waiting for time to progress. This simplifies testing concurrent behavior.
ktesting enables writing such tests with a new SyncTest method: it can start a
new sub-test (similar to Run) or turn an existing test (typically a top-level
Test<something>) into a sync test when no new name is given.
TContext.IsSyncTest can be used to check the mode of the current test, which
may be useful in common helper code.
TContext.Wait directly maps to synctest.Wait.
This new functionality is limited to tests which use an underlying testing.T
instance.
This closes a gap compared to the context package. It's useful when combined
with Ginkgo to keep something running beyond the end of the Ginkgo BeforeEach
or It node.
That WithCancel added a deferred cleanup which cancels on test termination was
unexpected. This automatic cancellation makes sense only for the initial root
TContext.
This allows declaring a code region as one step without having to use
an anonymous callback function, which has the advantage that variables
set during the step are visible afterwards.
In Python, this would be done as
with ktesting.Step(tctx) as tcxt:
// some code code inside step
// code not in the same step
But Go has no such construct.
In contrast to WithStep, the start and end of the step are logged, including
timing information.
Hiding the error in WithError is the right choice for example
when it is used inside ktesting.Eventually. Most callers probably want to deal
with the unexpected error themselves. For those who don't, WithErrorLogging
continues to log it.
WithTB was originally defined as "uses the existing logger". But what we want
there and in the newer TContext.Run is the usual per-test logging, now for the
sub-test.
This is not necessarily a problem, some code might use a timeout and expect it
to trigger. Therefore this should only be an info message, not a
warning. Long-term it might be useful to have an API where the caller decides
whether this gets logged.
The caller should use short messages and leave it to the user of those to
provide more context (no pun intended...). When logging, "canceling context" is
that context.
Before:
scheduler_perf.go:1431: FATAL ERROR: op 7: delete scheduled pods: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
contexthelper.go:69:
WARNING: the operation ran for the configured 2s
After:
scheduler_perf.go:1431: FATAL ERROR: op 7: delete scheduled pods: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
contexthelper.go:69:
INFO: canceling context: the operation ran for the configured 2s
How exactly a test reacts when its context times out is unclear. In the case of
scheduler_perf, the apiserver started to shut down and the test failure then
was about not being able to reach the apiserver, which was a bit confusing.
To make it more obvious why the shutdown starts, a WARNING message gets added
to the test output by ktesting before cancellation and thus before any other
output related to that cancellation.
format.Object adds some white space in front of the value and a type identifier
in angle brackets. Both is distracting when printing simple values and can be
avoided by picking fmt.Sprintf for those types, plus trimming the result of
format.Object.
Before:
allocator.go:483: I0625 15:35:31.946980] Allocating one device currentClaim= <int>: 0 totalClaims= <int>: 1 currentRequest= <int>: 0 totalRequestsPerClaim= <int>: 1 currentDevice= <int>: 0 devicesPerRequest= <int>: 1 allDevices= <bool>: false adminAccess= <bool>: false
After:
allocator.go:483: I0625 15:35:04.371441] Allocating one device currentClaim=0 totalClaims=1 currentRequest=0 totalRequestsPerClaim=1 currentDevice=0 devicesPerRequest=1 allDevices=false adminAccess=false
The return type of ktesting.NewTestContext is now a TContext. Code
which combined it WithCancel often didn't compile anymore (cannot overwrite
ktesting.TContext with context.Context). This is a good thing because all of
that code can be simplified to let ktesting handle the cancelation.
Extending the duration and the allowed delta in f6682370b1 was still not enough
to make the unit test run reliably in pull-kubernetes-unit.
Now it uses the original, stricter timing again, but only when run locally. In
Prow (detected by checking the "CI" env variable), the duration check is
skipped.
The new TContext interface combines a normal context and the testing interface,
then adds some helper methods. The context gets canceled when the test is done,
but that can also be requested earlier via Cancel.
The intended usage is to pass a single `tCtx ktesting.TContext` parameter
around in all helper functions that get called by a unit or integration test.
Logging is also more useful: Log[f] and Fatal[f] output is prefixed with
"[FATAL] ERROR: " to make it stand out more from regular log output.
If this approach turns out to be useful, it could be extended further (for
example, with a per-test timeout) and might get moved to a staging repository
to enable usage of it in other staging repositories.
To allow other implementations besides testing.T and testing.B, a custom
ktesting.TB interface gets defined with the methods expected from the
actual implementation. One such implementation can be ginkgo.GinkgoT().
The upstream ktesting has to be very flexible to accommodate different ways of
using it. In Kubernetes, we can be opinionated and make certain choices, like
using klog flags, and only those.