Commit graph

31 commits

Author SHA1 Message Date
Amritansh Amritansh
64915c6459 Promote agnhost e2e test image to 2.64.0 2026-06-10 14:18:35 +05:30
Antonio Ojea
adbf3b5aa5
Add granular authorization for DRA ResourceClaim status updates
This commit introduces the DRAResourceClaimGranularStatusAuthorization
feature gate (Beta in 1.36) to enforce fine-grained authorization checks
on ResourceClaim status updates.

Previously, 'update' permission on 'resourceclaims/status' allowed modifying
the entire status. To enforce the principle of least privilege for DRA
drivers and the scheduler, this change introduces synthetic subresources and
verb prefixes:

- 'resourceclaims/binding': Required to update 'status.allocation' and
  'status.reservedFor'.
- 'resourceclaims/driver': Required to update 'status.devices'. Evaluated
  on a per-driver basis using 'associated-node:<verb>' (for node-local
  ServiceAccounts) or 'arbitrary-node:<verb>' (for cluster-wide controllers).
2026-03-26 13:22:09 +00:00
Nour
4dffbf5b2a
Add tests for ResourcePoolStatusRequest
Add unit tests for handwritten and declarative validation, controller
logic, metrics, table printer output, controller-manager registration,
etcd storage round-trip, and an integration test for the full RPSR
lifecycle. Also add an e2e test exercising the DRA test driver with
RPSR and the example manifest.
2026-03-19 16:50:03 +02:00
Patrick Ohly
566dc7f3f3 DRA device taints: graduate to beta
The fields become beta, enabled by default. DeviceTaintRule gets
added to the v1beta2 API, but support for it must remain off by default
because that API group is also off by default.

The v1beta1 API is left unchanged. No-one should be using it
anymore (deprecated in 1.33, could be removed now if it wasn't for
reading old objects and version emulation).

To achieve consistent validation, declarative validation must be enabled also
for v1alpha3 (was already enabled for other versions). Otherwise,
TestVersionedValidationByFuzzing fails:

    --- FAIL: TestVersionedValidationByFuzzing (0.09s)
        --- FAIL: TestVersionedValidationByFuzzing/resource.k8s.io/v1beta2,_Kind=DeviceTaintRule (0.00s)
            validation_test.go:109: different error count (0 vs. 1)
                resource.k8s.io/v1alpha3: <no errors>
                resource.k8s.io/v1beta2: "spec.taint.effect: Unsupported value: \"幤HxÒQP¹¬永唂ȳ垞ş]嘨鶊\": supported values: \"NoExecute\", \"NoSchedule\", \"None\""
            ...
2026-03-12 18:26:02 +01:00
Omer Aplatony
201fe11b03 Promote agnhost image to 2.63.0
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
2026-02-05 17:21:34 +00:00
Davanum Srinivas
9dda58194a
Update agnhost to 2.61 and etcd to 3.6.7-0 in test manifests
Update outdated image versions across test manifests and add tracking
to build/dependencies.yaml for version drift detection via zeitgeist:

- agnhost: 2.32/2.53/2.54/2.57 → 2.61 (latest)
- etcd: 3.2.24 → 3.6.7-0
- kitten/nautilus BASEIMAGE: agnhost 2.57 → 2.61

and added etcd statefulset reference to existing etcd entry.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2026-01-18 20:39:57 -05:00
Patrick Ohly
6f51446802 DRA device taints: fix toleration of NoExecute
As usual, consumers of an allocated claim react to the information stored in
the status. In this case, the scheduler did not copy the tolerations into the
status and as a result a pod with a toleration for NoExecute got scheduled and
then immediately evicted.

Some additional logging gets added to make the handling easier to track in the
eviction controller. Example YAMLs allow reproducing the use case manually.
2025-10-08 13:13:47 +02:00
Patrick Ohly
5c4f81743c DRA: use v1 API
As before when adding v1beta2, DRA drivers built using the
k8s.io/dynamic-resource-allocation helper packages remain compatible with all
Kubernetes release >= 1.32. The helper code picks whatever API version is
enabled from v1beta1/v1beta2/v1.

However, the control plane now depends on v1, so a cluster configuration where
only v1beta1 or v1beta2 are enabled without the v1 won't work.
2025-07-24 08:33:45 +02:00
Patrick Ohly
60e9316c0c DRA E2E: refactor helper code
The helper code is useful for a separate Ginkgo suite for upgrade/downgrade
testing. We don't want to import test/e2e/dra there because that would also
define additional tests.
2025-07-15 12:54:40 +02:00
Patrick Ohly
5af026120a test: bump agnhost image to 2.54
YAML files were patched with:
   sed -i -e 's;registry.k8s.io/e2e-test-images/agnhost:2...;registry.k8s.io/e2e-test-images/agnhost:2.54;' $(git grep -l agnhost:2 test/e2e/testing-manifests/ test/fixtures/)

The test/images/kitten and test/images/nautilus base images are still on an
older agnhost because updating those is better left to the owners.
2025-05-05 08:25:20 +02:00
Patrick Ohly
dceae3b388 DRA e2e: avoid terminationGracePeriodSeconds
`terminationGracePeriodSeconds: 0` was a mistake, it bypasses the normal
pod shutdown in the kubelet.

The right way to shut down a pod quickly is to have it react to SIGTERM.
The busybox implementation of "sleep" doesn't. `agnhost pause` does,
so let's use that instead.

For E2E tests, the InfiniteSleepCommand was already change about a year ago to
react to SIGTERM, so the `terminationGracePeriodSeconds: 1` workaround is no
longer needed.
2025-05-02 10:52:56 +02:00
Patrick Ohly
3cadb6ff80 DRA test: update examples
Some more out-dated reference to resource class. Keeping the pod running is
better for demonstrating the lifecycle of claims because it is actually
possible to see a claim in the allocated state.
2025-04-29 09:55:05 +02:00
Kubernetes Prow Robot
6b8e5a9457
Merge pull request #130931 from nojnhuh/dra-vap-e2e
Remove unused VAP for DRA admin access e2e test
2025-03-20 01:36:51 -07:00
Jon Huhn
7d74a504ca Remove unused VAP for DRA admin access e2e test 2025-03-19 11:02:56 -05:00
Patrick Ohly
797475e113 DRA: add device taints API
This adds the "DeviceTaint" top-level type to v1alpha3 and related fields to
ResourceSlice and ResourceClaim. It's complete enough bring up an API server
and generate files.
2025-03-18 20:52:54 +01:00
Davanum Srinivas
1e64a89038
Reduce number of (versions of!) images we pull in our e2e tests
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2025-03-01 21:02:50 -05:00
Patrick Ohly
013f65b257 DRA: fix test-driver examples
They were still using an obsolete API version.
The driver must publish some devices for allocation
to succeed.
2025-01-24 18:30:36 +01:00
Jon Huhn
5b2c1dde79 Add namespace to DRA adminAccess ValidatingAdmissionPolicy message 2025-01-08 11:06:36 -06:00
Lionel Jouin
8be335a755 [KEP-4817] E2E: Update ResourceClaim.Status.Devices
Signed-off-by: Lionel Jouin <lionel.jouin@est.tech>
2024-11-07 09:54:19 +01:00
Patrick Ohly
33ea278c51 DRA: use v1beta1 API
No code is left which depends on the v1alpha3, except of course the code
implementing that version.
2024-11-06 13:03:19 +01:00
Patrick Ohly
357a2926a1 DRA e2e: update VAP for a kubelet plugin
This fixes the message (node name and "cluster-scoped" were switched) and
simplifies the VAP:
- a single matchCondition short circuits completely unless they're a user
  we care about
- variables to extract the userNodeName and objectNodeName once
  (using optionals to gracefully turn missing claims and fields into empty strings)
- leaves very tiny concise validations

Co-authored-by: Jordan Liggitt <liggitt@google.com>
2024-07-22 18:09:34 +02:00
Patrick Ohly
c526d7796e DRA e2e: use VAP to control "admin access" permissions
The advantages of using a validation admission policy (VAP) are that no changes
are needed in Kubernetes and that admins have full flexibility if and how they
want to control which users are allowed to use "admin access" in their
requests.

The downside is that without admins taking actions, the feature is enabled
out-of-the-box in a cluster. Documentation for DRA will have to make it very
clear that something needs to be done in multi-tenant clusters.

The test/e2e/testing-manifests/dra/admin-access-policy.yaml shows how to do
this. The corresponding E2E tests ensures that it actually works as intended.

For some reason, adding the namespace to the message expression leads to a
type check errors, so it's currently commented out.
2024-07-22 18:09:34 +02:00
Patrick Ohly
0b62bfb690 DRA e2e: adapt to v1alpha3 API 2024-07-22 18:09:34 +02:00
Patrick Ohly
de5742ae83 DRA: remove immediate allocation
As agreed in https://github.com/kubernetes/enhancements/pull/4709, immediate
allocation is one of those features which can be removed because it makes no
sense for structured parameters and the justification for classic DRA is weak.
2024-07-21 17:28:14 +02:00
Patrick Ohly
b51d68bb87 DRA: bump API v1alpha2 -> v1alpha3
This is in preparation for revamping the resource.k8s.io completely. Because
there will be no support for transitioning from v1alpha2 to v1alpha3, the
roundtrip test data for that API in 1.29 and 1.30 gets removed.

Repeating the version in the import name of the API packages is not really
required. It was done for a while to support simpler grepping for usage of
alpha APIs, but there are better ways for that now. So during this transition,
"resourceapi" gets used instead of "resourcev1alpha3" and the version gets
dropped from informer and lister imports. The advantage is that the next bump
to v1beta1 will affect fewer source code lines.

Only source code where the version really matters (like API registration)
retains the versioned import.
2024-07-21 17:28:13 +02:00
Patrick Ohly
ee3205804b dra e2e: demonstrate how to use RBAC + VAP for a kubelet plugin
In reality, the kubelet plugin of a DRA driver is meant to be deployed as a
daemonset with a service account that limits its
permissions. https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#additional-metadata-in-pod-bound-tokens
ensures that the node name is bound to the pod, which then can be used
in a validating admission policy (VAP) to ensure that the operations are
limited to the node.

In E2E testing, we emulate that via impersonation. This ensures that the plugin
does not accidentally depend on additional permissions.
2024-07-18 23:30:09 +02:00
Patrick Ohly
bde9b64cdf DRA: remove "source" indirection from v1 Pod API
This makes the API nicer:

    resourceClaims:
    - name: with-template
      resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      resourceClaimName: test-shared-claim

Previously, this was:

    resourceClaims:
    - name: with-template
      source:
        resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      source:
        resourceClaimName: test-shared-claim

A more long-term benefit is that other, future alternatives
might not make sense under the "source" umbrella.

This is a breaking change. It's justified because DRA is still
alpha and will have several other API breaks in 1.31.
2024-06-27 17:53:24 +02:00
Patrick Ohly
29941b8d3e api: resource.k8s.io v1alpha1 -> v1alpha2
For Kubernetes 1.27, we intend to make some breaking API changes:
- rename PodScheduling -> PodSchedulingHints (https://github.com/kubernetes/kubernetes/issues/114283)
- extend ResourceClaimStatus (https://github.com/kubernetes/enhancements/pull/3802)

We need to switch from v1alpha1 to v1alpha2 for that.
2023-03-14 07:52:03 +01:00
vaibhav2107
6ab8a8fbec Updated the change in registry 2023-02-09 09:37:44 +05:30
Swati Sehgal
4d15502e43 dra: test examples: ensure that the claim parameter name is consistent
In the Dynamic Resource allocation example specs, the claim
parameter name specified was inconsistent.

This commit fixes that with a better/more consistent name,
which is used to define the configmap and referenced in
the `ResourceClaimTemplate` spec.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2022-11-17 14:56:42 +00:00
Patrick Ohly
14db9d1f92 e2e dra: add test driver and tests for dynamic resource allocation
The driver can be used manually against a cluster started with
local-up-cluster.sh and is also used for E2E testing. Because the tests proxy
connections from the nodes into the e2e.test binary and create/delete files via
the equivalent of "kubectl exec dd/rm", they can be run against arbitrary
clusters. Each test gets its own driver instance and resource class, therefore
they can run in parallel.
2022-11-12 00:17:15 +01:00