Using the "normal" logic for a feature gated field simplifies the
implementation of the feature gate.
There is one (entirely theoretic!) problem with updating from 1.31: if a claim
was allocated in 1.31 with admin access, the status field was not set because
it didn't exist yet. If a driver now follows the current definition of "unset =
off", then it will not grant admin access even though it should. This is
theoretic because drivers are starting to support admin access with 1.32, so
there shouldn't be any claim where this problem could occur.
The new DRAAdminAccess feature gate has the following effects:
- If disabled in the apiserver, the spec.devices.requests[*].adminAccess
field gets cleared. Same in the status. In both cases the scenario
that it was already set and a claim or claim template get updated
is special: in those cases, the field is not cleared.
Also, allocating a claim with admin access is allowed regardless of the
feature gate and the field is not cleared. In practice, the scheduler
will not do that.
- If disabled in the resource claim controller, creating ResourceClaims
with the field set gets rejected. This prevents running workloads
which depend on admin access.
- If disabled in the scheduler, claims with admin access don't get
allocated. The effect is the same.
The alternative would have been to ignore the fields in claim controller and
scheduler. This is bad because a monitoring workload then runs, blocking
resources that probably were meant for production workloads.
Drivers need to know that because admin access may also grant additional
permissions. The allocator needs to ignore such results when determining which
devices are considered as allocated.
In both cases it is conceptually cleaner to not rely on the content of the
ClaimSpec.
The main purpose is to protect against denial-of-service attacks. Scheduling
time depends a lot on unpredictable factors and expected scheduling time also
varies, so no attempt is made to limit the overall time spent on evaluating CEL
expressions per claim.
This removes the DRAControlPlaneController feature gate, the fields controlled
by it (claim.spec.controller, claim.status.deallocationRequested,
claim.status.allocation.controller, class.spec.suitableNodes), the
PodSchedulingContext type, and all code related to the feature.
The feature gets removed because there is no path towards beta and GA and DRA
with "structured parameters" should be able to replace it.
Fix typo and grammar in comments that get reflected through to the
generated documentation, regarding VolumeAttachments' use of
PersistentVolumes and PersistentVolumeClaims.
Signed-off-by: Eric Blake <eblake@redhat.com>
This is a complete revamp of the original API. Some of the key
differences:
- refocused on structured parameters and allocating devices
- support for constraints across devices
- support for allocating "all" or a fixed amount
of similar devices in a single request
- no class for ResourceClaims, instead individual
device requests are associated with a mandatory
DeviceClass
For the sake of simplicity, optional basic types (ints, strings) where the null
value is the default are represented as values in the API types. This makes Go
code simpler because it doesn't have to check for nil (consumers) and values
can be set directly (producers). The effect is that in protobuf, these fields
always get encoded because `opt` only has an effect for pointers.
The roundtrip test data for v1.29.0 and v1.30.0 changes because of the new
"request" field. This is considered acceptable because the entire `claims`
field in the pod spec is still alpha.
The implementation is complete enough to bring up the apiserver.
Adapting other components follows.
As agreed in https://github.com/kubernetes/enhancements/pull/4709, immediate
allocation is one of those features which can be removed because it makes no
sense for structured parameters and the justification for classic DRA is weak.
This is in preparation for revamping the resource.k8s.io completely. Because
there will be no support for transitioning from v1alpha2 to v1alpha3, the
roundtrip test data for that API in 1.29 and 1.30 gets removed.
Repeating the version in the import name of the API packages is not really
required. It was done for a while to support simpler grepping for usage of
alpha APIs, but there are better ways for that now. So during this transition,
"resourceapi" gets used instead of "resourcev1alpha3" and the version gets
dropped from informer and lister imports. The advantage is that the next bump
to v1beta1 will affect fewer source code lines.
Only source code where the version really matters (like API registration)
retains the versioned import.
Adding the required Kubernetes API so that the kubelet can start using
it. This patch also adds the corresponding alpha feature gate as
outlined in KEP 4639.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
This change make api-server not omity the field `item` of
ValidatingAdmissionPolicyList when ValidatingAdmissionPolicy is empty.
So kubetl will print ValidatingAdmissionPolicyList correctly when ValidatingAdmissionPolicy is empty.
Signed-off-by: xyz-li <hui0787411@163.com>
KEP-3619: don't capitalize comment in K8S API
KEP-3619: fix typos and grammatical ones in K8s API
KEP-3619: rephrase NodeFeatures, NodeHandlerFeatures in K8s API
Introduce Windows section for internal configuration of kube-proxy
adhering to the v1alpha2 version specifications as detailed in
https://kep.k8s.io/784. This also introduces WindowsRunAsService
to v1alpha1 configuration.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
The 'DefaultProcMount' string is the name of the variable, but the
actual value is 'Default'.
Signed-off-by: Arturo Borrero Gonzalez <arturo.bg@arturo.bg>
This makes the API nicer:
resourceClaims:
- name: with-template
resourceClaimTemplateName: test-inline-claim-template
- name: with-claim
resourceClaimName: test-shared-claim
Previously, this was:
resourceClaims:
- name: with-template
source:
resourceClaimTemplateName: test-inline-claim-template
- name: with-claim
source:
resourceClaimName: test-shared-claim
A more long-term benefit is that other, future alternatives
might not make sense under the "source" umbrella.
This is a breaking change. It's justified because DRA is still
alpha and will have several other API breaks in 1.31.
Currently, the go doc and as a result the generated swagger docs for
node.status.capacity links to a documentation page for PV capacity.
Update it to link to our nodes about node capacity and alloctable
instead.
* Add `Linux{Sandbox,Container}SecurityContext.SupplementalGroupsPolicy` and `ContainerStatus.user` in cri-api
* Add `PodSecurityContext.SupplementalGroupsPolicy`, `ContainerStatus.User` and its featuregate
* Implement DropDisabledPodFields for PodSecurityContext.SupplementalGroupsPolicy and ContainerStatus.User fields
* Implement kubelet so to wire between SecurityContext.SupplementalGroupsPolicy/ContainerStatus.User and cri-api in kubelet
* Clarify `SupplementalGroupsPolicy` is an OS depdendent field.
* Make `ContainerStatus.User` is initially attached user identity to the first process in the ContainerStatus
It is because, the process identity can be dynamic if the initially attached identity
has enough privilege calling setuid/setgid/setgroups syscalls in Linux.
* Rewording suggestion applied
* Add TODO comment for updating SupplementalGroupsPolicy default value in v1.34
* Added validations for SupplementalGroupsPolicy and ContainerUser
* No need featuregate check in validation when adding new field with no default value
* fix typo: identitiy -> identity
The behavior when you specify no --nodeport-addresses value in a
dual-stack cluster is terrible and we can't fix it, for
backward-compatibility reasons. Actually, the behavior when you
specify no --nodeport-addresses value in a single-stack cluster isn't
exactly awesome either...
Allow specifying `--nodeport-addresses primary` to get the
previously-nftables-backend-specific behavior of listening on only the
node's primary IP or IPs.
The runtime classes are apiserver's concept, while the handlers are kubelet's concept.
For NodeStatus, it makes more sense to return the latter ones here.
This commit modifies the following files:
- pkg/apis/core/types.go
- staging/src/k8s.io/api/core/v1/types.go
- pkg/kubelet/nodestatus/setters.go
- pkg/kubelet/kubelet_node_status.go
- pkg/registry/core/node/strategy.go
- test/e2e_node/mount_rro_linux_test.go
Other changes were auto-generated by running `make update`.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
This commit modifies the following files:
- pkg/apis/core/types.go
- staging/src/k8s.io/api/core/v1/types.go
Other changes were auto-generated by running `make update`.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
While currently those objects only get published by the kubelet for node-local
resources, this could change once we also support network-attached
resources. Dropping the "Node" prefix enables such a future extension.
The NodeName in ResourceSlice and StructuredResourceHandle then becomes
optional. The kubelet still needs to provide one and it must match its own node
name, otherwise it doesn't have permission to access ResourceSlice objects.
This adds support for semantic version comparison to the CEL support in the
"named resources" structured parameter model. For example, it can be used to
check that an instance supports a certain API level.
To minimize the risk, the new "semver" type is only defined in the CEL
environment for DRA expressions, not in the base library. See
https://github.com/kubernetes/kubernetes/pull/123664 for a PR which
adds it to the base library.
Validation of semver strings is done with the regular expression from
semver.org. The actual evaluation at runtime then uses semver/v4.
Like the current device plugin interface, a DRA driver using this model
announces a list of resource instances. In contrast to device plugins, this
list is made available to the scheduler together with attributes that can be
used to select suitable instances when they are not all alike.
Because this is the first structured parameter model, some checks that
previously were not possible, in particular "is one structured parameter field
set", now gets enabled. Adding another structured parameter model will be
similar.
The applyconfigs code generator assumes that all types in an API are defined in
a single package. If it wasn't for that, it would be possible to place the
"named resources" types in separate packages, which makes their names in the Go
code more natural and provides an indication of their stability level because
the package name could include a version.
NodeResourceSlice will be used by kubelet to publish resource information on
behalf of DRA drivers on the node. NodeName and DriverName in
NodeResourceSlice must be immutable. This simplifies tracking the different
objects because what they are for cannot change after creation.
The new field in ResourceClass tells scheduler and autoscaler that they are
expected to handle allocation.
ResourceClaimParameters and ResourceClassParameters are new types for telling
in-tree components how to handle claims.
* support for the managed-by label in Job
* Use managedBy field instead of managed-by label
* Additional review remarks
* Review remarks 2
* review remarks 3
* Skip cleanup of finalizers for job with custom managedBy
* Drop the performance optimization
* imrpove logs
When moving the reservation of a claim for a pod into the PreBind phase in a
future commit, multiple different update attempts will be executed
concurrently. We want an attempt to succeed if and only if adding the entry
passes validation. Without patch strategy and key, strategic-merge-patch
replaces the entire ReservedFor instead of adding new entries.
Server-side-apply cannot be used because each attempt may start with a stale
ResourceClaim (thus cannot send the entire ReservedFor) and SSA doesn't support
merging when using the same manager string. Using different managers (one for
each entry) would work, but sounds like a bad hack.
This assumes that any such field is atomic, except:
* OwnerReferences: because it has a `+patchStrategy=merge`, but it
probably needs a `+listMapKey=...` ?
* Finalizers: because it hs a `+patchStrategy=merge`, but is a
primitive type (string).
* []byte fields, which should not be failing this anyway (fixed
subsequently).
An alternative approach could be just to turn off the API warnings for
these fields, but it felt more correct to declare the semantics.
KEP-2593 proposed to expand the existing node-ipam controller
to be configurable via a ClusterCIDR objects, however, there
were reasonable doubts on the SIG about the feature and after
several months of dicussions we decided to not move forward
with the KEP intree, hence, we are going to remove the existing
code, that is still in alpha.
https://groups.google.com/g/kubernetes-sig-network/c/nts1xEZ--gQ/m/2aTOUNFFAAAJ
Change-Id: Ieaf2007b0b23c296cde333247bfb672441fe6dfc
The "set" list type was chosen because it seemed appropriate (no duplicates!)
but that made tracking of managed fields more expensive (each entry in the list
is tracked, not the entire field) and for no good reason (one client is
responsible for the entire list).
Therefore the type gets changed to "atomic". Server-side-apply has not been
used in the past and PodSchedulingContext objects are short-lived and still in
alpha, so the any potential compatibility issues should be minor.
The scheduling throughput in scheduler_perf increases:
name old SchedulingThroughput/Average new SchedulingThroughput/Average
PerfScheduling/SchedulingWithResourceClaimTemplate/2000pods_100nodes-36 18.8 ± 8% 24.0 ±37%
PerfScheduling/SchedulingWithMultipleResourceClaims/2000pods_100nodes-36 13.7 ±81% 18.5 ±40%