Presumably
https://github.com/kubernetes/kubernetes/pull/127260/files#r2405215911
was meant to continue polling after a watch was closed by the apiserver.
This is something that can happen under load.
However, returning the error has the effect that polling stops.
This can be seen as test failures when testing with race
detection enabled:
persistent_volumes_test.go:1101: Failed to wait for all claims to be bound: watch closed
adds a new integration test to verify that the API server's egress
to admission webhooks correctly respects the standard `HTTPS_PROXY`
and `NO_PROXY` environment variables.
It adds a new test util to implement a Fake DNS server that allows
to override DNS resolution in tests, specially useful for integration
test that can only bind to localhost the servers, that is ignored
by certain functionalities.
For some packages (in particular the code DRA), all allocator implementations
can handle the testcases. Some other packages are for less stable features and
work with fewer implementations. Now all unit tests are run with all suitable
implementations, to increase code coverage.
Benchmarks are fixed to the most mature implementation because they would be
costly to run in more than one. When promoting an allocation implementation we
can do before/after comparisons to detect potential performance regressions.
The downside of this approach is that we need to remember to extend the list
of supported implementations when promoting features, otherwise testing will miss
some new supported implementation.
Without this, the effect of the following feature gate config would be random:
featureGates:
AllBeta: false
SomeBetaFeature: true
That's random because the order of iterating of the map is randomized by Go and
`AllBeta: false` would disable `SomeBetaFeature` if (and only if) applied last.
Now by sorting alphabetically, AllAlpha/Beta come first in practice. It's not a
complete solution, some future feature gate name might come before it.
The startup phase may have allocated memory that can be garbage-collected.
Forcing GC to run before measurements avoids noise if the garbage collection
kicks in during the measurement and potentially reduces the heap size reported
by metrics.
The exact effect has not been measured, it just seems useful.
When looking at a CPU profile, the cache mutation detection stood out. "make
test-integration" enables it by default. We try to benchmark "real" production
setups, therefore we have to prevent that by setting it to false ourselves.
This change updates the NowFunc to be per KMS provider instead of global
to the API server. This allows integration tests that use distinct
provider names to run in parallel when simulating key expiry.
Signed-off-by: Anish Ramasekar <anish.ramasekar@gmail.com>
The steady-state pod scheduling is less suitable for integration tests because
the duration is either short (making the test potentially flaky if nothing gets
scheduled yet due to the time constraint) or long (making the test run too
long). It is more useful for benchmark testcase because of the bounded runtime.
Now a single workload definition can be used in both modes with a configuration
parameter for "steadyState".
Workload definitions get updated accordingly. While at it, their names get
simplified and some (in the case of the main DRA config) redundant testcases
get removed.
Some of the DRA testcases schedule pods in a steady state for a certain
duration. They pass even if no pods got scheduled at all because in contrast to
the non-steady-state variants they don't wait for fixed number of pods to be
scheduled. This made them unsuitable for integration testing because a real
problem is not flagged as test failure. Now "zero pods scheduled" is detected
for them.
However, they are still not good integration tests (either run quickly and then
risk being flaky or run for a longer time period and then are slow). Revisiting
how they are used in configurations will be done separately.
The time required for pulling ResourceSlices into the scheduler is relevant in
two cases:
- The scheduler was (re)started and waits for informers to sync.
- A driver got deployed and needs to inform the scheduler about its devices.
The new workload measures the second scenario. It's indirectly relevant for
the first one because it allows drawing conclusion about the code which is also
involved in the first one.
After creating ResourceSlices, the workload was allowed to proceed even while
the scheduler was still busy receiving those new ResourceSlices. This blurred
the line between "setup" and "measurement" phase of DRA workloads. It's not
immediately clear how much that affected results, but it is cleaner to block.
This is done by returning the scheduler instance to the main scheduler_perf
loop and then pass the SharedDRAManager into the driver setup operation. There
it can be used to poll until that manager has processed all ResourceSlices.
Before, metrics gathered by testing.B (runtime_seconds,
-benchmem's B/op and allocs/op) covered the entire test case, including
starting the apiserver and the initialization steps of a workload. Now those
metrics are also limited to the period where the workload is configured to
collect metrics.