The local-up-cluster.sh script was used as proxy for controlling the etcd
lifecycle because the script was using a helper function that didn't support
dry-run mode. That approach was flawed in several ways, causing etcd to be left
running:
- Cmd.Stop wasn't actually called for the script by Cluster.Stop.
- If it had been called, the additional output during shutdown wouldn't
have been processed, which might have caused the command to block instead
of terminating (untested).
- It's unclear whether the script would have handled the signal properly.
A cleaner approach is to enable dry-run mode also in etcd .sh and then let
Cluster manage it like any other long-running process. Then we can let
local-up-cluster.sh terminate when it's done with its work. Cluster.Start can
check it's result immediately.
The format package is used by ktesting, both to reconfigure Gomega and to
format errors, therefore it has to be moved to staging together with ktesting,
if or when we get to that because those are desirable features.
Because format only has the YAML package as additional dependency and that
should be okay for all other repos (except for the YAML package itself, of
course), we can publish the format package as a sub-package of such a future
ktesting module.
Avoiding the dependency on apimachinery to detect unstructured.Unstructured is
a bit tricky, but doable by relaxing what we check for. The test/utils/format
package is kept to test ktesting/format with the actual packages that it cannot
depend on (apimachinery, api).
Depending on component-base/logs/testinit was convenient and avoided any doubts
about the init order, but isn't acceptable long-term as an additional
dependency because component-base is too big. The same functionality (flag
registration) can also be implemented directly in ktesting. Because Go 1.21
clarified the order in which independent packages get initialized, we know for
sure that "our" code runs after testinit and can handle a potential conflict.
While at it, introduce a KTESTING_VERBOSITY env variable to enable increasing
the default verbosity in CI jobs which run a mixture of tests where some don't
use ktesting and thus don't accept a -v=<something> parameter.
These hints showed for the client-go/ktesting because there the code is new.
They also apply exactly the same way to the old code, so both gets updated.
The client-go variant of ktesting is a superset of the normal
ktesting, which makes it possible to get the full original
functionality simply by changing the import path.
This enables passing the client-go clients and helpers via TContext.WithValue.
The advantage of this approach is that the implementation is small. The
downside is that all call sites need to be updated and need two imports. It's
also not discoverable from the TContext type that it may provide clients.
Only a few callers get updated to demonstrate the usage.
The goal is to make ktesting available for unit testing in *all* Kubernetes
packages. To achieve that, it must not depend on packages which themselves
depend on other Kubernetes packages. client-go was the biggest of those
dependencies (but not the only one, see below), so it can't be part of the
TContext API.
How to to bring back passing of those values via a TContext is to be
decided. Options are:
- via WithValue
- by wrapping TContext
k8s.io/component-base/logs is another problematic dependency that is going to
be harder to resolve. Others are just work (testify!).
To prevent regressing accidentally, import-boss is now used to check
dependencies.
As a special case, WithContext preserved the logger in the parent context. But
for the upcoming usage of WithValue to store a Kubernetes client it is
important to also preserve access to other values.
The latest pause version is 3.10.2 but due to the introduction
of the PATCH level version to the pause image (previously was
only MAJOR.MINOR), various files have remained on an older
version. Either 3.10 or 3.10.1. Our validation with
build/dependencies.yaml ./hack/verify-external-dependencies.sh
did not account for that.
Deadline is available inside a synctest bubble, but calling it panics. To
support constructing a TContext inside a bubble, we have to catch the panic
because there is no API to detect a bubble in advance. Detecting a panic is
then also used to set the result of TContext.IsSyncTest.
While at it, cleaning up the code a bit and adding unit tests for the Deadline
behavior.
Modify() was replacing components one at a time: stop X, start X, stop Y,
start Y, ... in version-skew order (apiserver last on downgrade). This
caused a crash during downgrade: KCM-1.35 started against the still-
running apiserver-1.36, passed its /healthz, and then immediately lose
its connection when apiserver-1.36 was killed by the localupcluster.
KCM-1.35 would reconnect to the not-yet-ready apiserver-1.35, hit a
403 RBAC error during controller initialization, and exit — because that
initialization phase does not retry on RBAC errors.
Fix by splitting Modify() into two phases:
Phase 1 — stop all components to be replaced, in reverse startup order
(kube-proxy down to apiserver), so dependent components release their
connections before the apiserver is stopped.
Phase 2 — start all replacement components in standard startup order
(apiserver first), so each component connects to a fully-ready apiserver.
Without an explicit interval, Gomega's default polling is very frequent,
generating a large volume of /readyz and /healthz requests in the component
logs. Set an explicit 1-second interval to reduce noise while still
detecting readiness promptly.
Despite being called checkReadiness, the function was only performing
a liveness check: /healthz was polled over HTTPS without verifying the
certificate or authenticating, and any HTTP response was accepted as a
signal that the component was up. The only exception was kubelet,
where a node readiness check was added on top.
Switched to /readyz for kube-apiserver and kube-scheduler,
kept /healthz for the rest and require HTTP 200 in all cases.
This ensures that the kube-apiserver is fully initialized before
dependent components are started.
Not canceling the parent context made sense, but the new context should
be cancelable like any other TContext. Found when passing tCtx.WithoutCancel()
to StartTestServer and the tear-down function got stuck because it couldn't
cancel the context.
add integration test
Use proper test header, change to etcdMain to recognize test flags
fix goroutine leak in integration test
redo integration test with kubeapiserverapptesting
fix comment capitalization, use existing client libraries
fix comment capitalization, use existing client libraries
consolidate http connect handler logic from odic and tls_server-name into helper
add expected SNI, remove unused test
move oidc helpers.go to right dir, remove copyright year
split helpers.go into descriptive file names
use atomic ptr for SNI, refactor generateTestCerts, remove errors from runTLSEgressProxy, explain jwksHandler in comment
use testify, add back context messages
Clean up tests
Signed-off-by: Monis Khan <mok@microsoft.com>
This change allows slow impersonation requests to be tracked via the
apiserver.latency.k8s.io/impersonation audit event annotation.
Updated tests to assert that the audit event log:
- Contains the new latency annotation
- Contains the impersonationConstraint field
- Failed impersonation attempts are observable by the response status
Signed-off-by: Monis Khan <mok@microsoft.com>
In some (all?) CI jobs the initial kubelet instance keeps running, despite
command context cancellation. Not reproducible locally, so additional output
was necessary to track down the root cause in CI runs: signal propagation via
sudo didn't work for kube-proxy and kubelet, but only for those two and only in
the CI. The fix is to change the CI jobs so that they disable the usage of
sudo.
While at it, simplify by replacing atomic.Pointer with atomic.Boole.
The type alias made `go doc ./test/utils/ktesting.TContext` useless and was a
weird workaround for preserving the original interface type name. Passing a
TContext instance by value (almost) preserves the original API and is
acceptable because the struct is still small. The only consumers which need to
be updated are those which relied on passing nil as tCtx.
If we ever find that TContext is or becomes too large, then we can make it
a wrapper around some pointer.
I've not been able to trigger the flake, but it could happen:
- time.Sleep unblocks some background goroutines inside the synctest bubble.
- Those goroutines do not actually run yet.
- The main test checks for the result of those goroutines.
Adding a `synctest.Wait` ensures that all background processing is complete
because it waits for all goroutines to be durably blocked.
If the goroutine happens to log after the test has already terminated,
testing.T.Log panics. We must ensure that the goroutine has stopped before
allowing the test to terminate.
Replace all imports of k8s.io/apimachinery/pkg/util/dump with
k8s.io/utils/dump across the repo. The apimachinery dump package
now contains deprecated wrapper functions that delegate to
k8s.io/utils/dump for backwards compatibility.
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
The tests validate the sidecar's functionality,
integration with the CSI driver and correctness of
metadata retrieval for snapshot backups.
This will help CSI vendors test their implementation
of the snapshot-metadata feature.
Issue: kubernetes-csi/external-snapshot-metadata#120
Signed-off-by: Praveen M <m.praveen@ibm.com>