Commit graph

4891 commits

Author SHA1 Message Date
Rob Scott
c77cbb7add
Updating EndpointSlice controller to wait for cache to be updated
This updates the EndpointSlice controller to make use of the
EndpointSlice tracker to identify when expected changes are not present
in the cache yet. If this is detected, the controller will wait to sync
until all expected updates have been received. This should help avoid
race conditions that would result in duplicate EndpointSlices or failed
attempts to update stale EndpointSlices. To simplify this logic, this
also moves the EndpointSlice tracker from relying on resource versions
to generations.
2021-03-11 10:01:03 -08:00
Kubernetes Prow Robot
88457be8c5 Merge pull request #96876 from howieyuen/no-execute-taint-missing
fix nodelifecyle controller not add NoExecute taint bug
2021-02-10 11:25:56 +08:00
Morten Torkildsen
ded92f8618 Fix nil pointer dereference in disruption controller 2021-02-04 22:50:08 +02:00
Vladimir Nachev
3911b8f24b
Fix build after cherry-picking 2021-01-30 00:35:23 +02:00
Jayasekhar Konduru
4cd106a920
Recover CSI volumes from dangling attachments
Change-Id: I72105d67d8a4069ab19bfa4638a7ac365cf4194c
2021-01-26 22:55:08 +02:00
Cheng Xing
f8ff0db868
IsVolumeAttachedToNode() renamed to GetAttachState(), and returns 3 states instead of combining "uncertain" and "detached" into "false" 2021-01-26 22:54:58 +02:00
Cheng Xing
f548351d77
Fixes Attach Detach Controller reconciler race reading ActualStateOfWorld and operation pending states; fixes reconciler_test mock detach to account for multiple attaches on a node 2021-01-26 22:54:57 +02:00
Kubernetes Prow Robot
ded8a1e285
Merge pull request #96863 from tosi3k/automated-cherry-pick-of-#90218-#90242-upstream-release-1.18
Automated cherry pick of #90218: Lazy initialization of network urls for GCE provider #90242: Avoid unnecessary GCE API calls for IP-alias calls
2020-11-27 07:30:47 -08:00
Kubernetes Prow Robot
5b5fc6e914
Merge pull request #96135 from cofyc/release-1.18-fix95538
[release-1.18] volume binding: report UnschedulableAndUnresolvable status instead of an error when bound PVs not found
2020-11-27 04:58:47 -08:00
wojtekt
5b79dff344 Avoid unnecessary GCE API calls for IP-alias calls
This is to avoid unnecessary GCE API calls done by getInstanceByName
helper, which is iterating over all zones to find in which zone the
VM exists.
ProviderID already contains all the information - it's in the form:
gce://<VM URL> (VM URL contains project, zone, VM name).

ProviderID is propagated by Kubelet on node registration and in case
of bugs backfilled by node-controller.
2020-11-25 15:36:51 +01:00
Kubernetes Prow Robot
1d5e9d5530
Merge pull request #95828 from arjunrn/automated-cherry-pick-of-#95647-upstream-release-1.18
Automated cherry pick of #95647: If we set SelectPolicy MinPolicySelect on scaleUp behavior or
2020-11-05 05:17:16 -08:00
Kubernetes Prow Robot
06f65422f5
Merge pull request #95650 from josephburnett/automated-cherry-pick-of-#95560-upstream-release-1.18
Automated cherry pick of #95560: Ignore deleted pods.
2020-11-05 05:16:54 -08:00
Yecheng Fu
9b45be1ca4 volume binding: report UnschedulableAndUnresolvable status instead of an error when bound PVs not found
This is patch on release-1.18.
2020-11-04 09:40:53 +08:00
weiwei
383807d083 If we set SelectPolicy MinPolicySelect on scaleUp behavior or scaleDown behavior,Horizontal Pod Autoscaler doesn`t automatically scale the number of pods correctly
Signed-off-by: weiwei <weiwei@tenxcloud.com>
2020-10-23 14:43:40 +02:00
ialidzhikov
c23350e181 Do not assume storageclass is still in-tree after csi migration
Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
2020-10-20 13:20:52 +03:00
Joseph Burnett
93f709028f Ignore deleted pods.
When a pod is deleted, it is given a deletion timestamp. However the
pod might still run for some time during graceful shutdown. During
this time it might still produce CPU utilization metrics and be in a
Running phase.

Currently the HPA replica calculator attempts to ignore deleted pods
by skipping over them. However by not adding them to the ignoredPods
set, their metrics are not removed from the average utilization
calculation. This allows pods in the process of shutting down to drag
down the recommmended number of replicas by producing near 0%
utilization metrics.

In fact the ignoredPods set is misnomer. Those pods are not fully
ignored. When the replica calculator recommends to scale up, 0%
utilization metrics are filled in for those pods to limit the scale
up. This prevents overscaling when pods take some time to startup. In
fact, there should be 4 sets considered (readyPods, unreadyPods,
missingPods, ignoredPods) not just 3.

This change renames ignoredPods as unreadyPods and leaves the scaleup
limiting semantics. Another set (actually) ignoredPods is added to
which delete pods are added instead of being skipped during
grouping. Both ignoredPods and unreadyPods have their metrics removed
from consideration. But only unreadyPods have 0% utilization metrics
filled in upon scaleup.
2020-10-16 14:43:51 +02:00
Mike Dame
975cac0726 Remove HeadlessService label in endpoints controller before comparing 2020-09-21 10:12:25 -04:00
Kubernetes Prow Robot
a171160054
Merge pull request #91987 from 249043822/automated-cherry-pick-of-#91008-upstream-release-1.18
Automated cherry pick of #91008: Do not swallow NotFound error for DeletePod in dsc.manage
2020-09-11 22:22:59 -07:00
PingWang
32734cd425 Cherry pick of #93908: Updating EndpointSlice controllers to return if error encountered
Signed-off-by: PingWang <wang.ping5@zte.com.cn>
2020-09-05 09:40:20 +08:00
Kubernetes Prow Robot
713eff35bb
Merge pull request #94253 from ialidzhikov/automated-cherry-pick-of-#91311-upstream-release-1.18
Automated cherry pick of #91311: Ensuring EndpointSlice controller does not create
2020-09-03 19:55:41 -07:00
Kubernetes Prow Robot
5c9030d5f5
Merge pull request #94119 from robscott/automated-cherry-pick-of-#91399-upstream-release-1.18
Automated cherry pick of #91399: Fix Endpoint/EndpointSlice pod change detection
2020-09-03 19:02:07 -07:00
Kubernetes Prow Robot
10c682b84a
Merge pull request #93814 from liggitt/automated-cherry-pick-of-#93722-upstream-release-1.18
Automated cherry pick of #93722: Do not evict pods which tolerate all NoExecute taints
2020-09-03 19:01:57 -07:00
Kubernetes Prow Robot
e9ac3d7d25
Merge pull request #93798 from liggitt/automated-cherry-pick-of-#93790-upstream-release-1.18
Automated cherry pick of #93790: Fix namespace controller cleanup orphaning
2020-09-02 06:29:07 -07:00
Kubernetes Prow Robot
f0f4e5722d
Merge pull request #94116 from robscott/automated-cherry-pick-of-#94086-upstream-release-1.18
Automated cherry pick of #94086: Updating EndpointSlice controller to wait for all caches to
2020-09-02 05:35:30 -07:00
Rob Scott
4e86d46243 Ensuring EndpointSlice controller does not create EndpointSlices for Services that are being deleted.
This should ensure that the controller does not conflict with garbage collection.
2020-08-26 17:00:00 +03:00
Kubernetes Prow Robot
52a2c076d2
Merge pull request #93892 from ahg-g/ahg-volume2
"unbound immediate PersistentVolumeClaims" should be UnschedulableAnd…
2020-08-24 14:22:16 -07:00
Dan Winship
ebe195a52e
Improve EndpointController's handling of headless services under dual-stack
EndpointController was accidentally requiring all headless services to
be IPv4-only in clusters with IPv6DualStack enabled.

This still leaves "legacy" (ie, IPFamily-less) headless services as
always IPv4-only because the controller doesn't currently have easy
access to the information that would allow it to fix that.
(EndpointSliceController had the same problem already, and still
does.) This can be fixed, if needed, by manually setting IPFamily,
and the proposed API for 1.20 will handle this situation better.
2020-08-19 17:44:53 -07:00
Dan Winship
8cab056c81
Improve EndpointController dual-stack testing
Rewrite some of the test helpers to better support single-stack IPv4
vs single-stack IPv6 vs dual-stack IPv4 primary vs dual-stack IPv6
primary, and update TestPodToEndpointAddressForService to test some
more cases.
2020-08-19 17:44:52 -07:00
Dan Winship
86c98ae6b5
Fix Endpoint/EndpointSlice pod change detection
The endpoint controllers responded to Pod changes by trying to figure
out if the generated endpoint resource would change, rather than just
checking if the Pod had changed, but since the set of Pod fields that
need to be checked depend on the Service and Node as well, the code
ended up only checking for a subset of the changes it should have.

In particular, EndpointSliceController ended up only looking at IPv4
Pod IPs when processing Pod update events, so when a Pod went from
having no IP to having only an IPv6 IP, EndpointSliceController would
think it hadn't changed.
2020-08-19 17:44:52 -07:00
Rob Scott
c3ed3d81a4
Updating EndpointSlice controller to wait for all caches to be synced
Previously the EndpointSlice controller was not waiting for
EndpointSlices or Nodes to be synced.
2020-08-19 17:41:10 -07:00
Abdullah Gharaibeh
d9f532029e "unbound immediate PersistentVolumeClaims" should be UnschedulableAndUnresolvable error
This was fixed in 1.19 by refactoring that part into PreFilter in https://github.com/kubernetes/kubernetes/pull/91775
2020-08-11 13:06:57 -04:00
Jordan Liggitt
9140f1c67b Do not evict pods which tolerate all NoExecute taints 2020-08-08 10:27:57 -04:00
Jordan Liggitt
5c425d259a Fix namespace controller cleanup orphaning 2020-08-07 15:29:05 -04:00
Kubernetes Prow Robot
11a69afdaf
Merge pull request #92293 from tnozicka/fix-ds-recreate-1.18
[1.18] Fix DS expectations on recreate
2020-06-27 22:20:14 -07:00
Tomas Nozicka
c8a1771447 Fix DS expectations on recreate 2020-06-19 11:38:07 +02:00
KeZhang
0c5c60772e Do not swallow NotFound error for DeletePod in dsc.manage 2020-06-12 10:16:37 +08:00
Jayasekhar Konduru
47e9d077da CSI: Modify VolumeAttachment check to use Informer/Cache
Change-Id: Ie70c8b6657c67eefbf13042f36d56ca84a2e42bb
2020-06-11 00:23:15 +00:00
ialidzhikov
5ca8d60f31 Fix Node initialization for GCP cloud provider
Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
2020-04-15 21:51:09 +03:00
shibataka000
d2e35c7a6a Fix bug about unintentional scale out during updating deployment.
During rolling update with maxSurge=1 and maxUnavailable=0,
len(metrics) is greater than currentReplcas
and it may cause unintentional scale out.
2020-03-26 12:30:24 +01:00
Andrew Sy Kim
366dd4af44 EndpointSlice and Endpoints should treat terminating pods the same
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2020-03-11 13:03:18 -04:00
skilxn-go
6b8fc8dc5e Move TaintBasedEvictions feature gates to GA 2020-03-09 10:49:00 +08:00
Jordan Liggitt
d8abacba40 client-go: update expansions callers 2020-03-06 16:50:41 -05:00
Kubernetes Prow Robot
f52cbea102
Merge pull request #88910 from liggitt/metadata-context
Metadata client: plumb context
2020-03-06 13:18:04 -08:00
Kubernetes Prow Robot
ef672c1c2d
Merge pull request #88678 from verult/slow-rxm-attach
Parallelize attach operations across different nodes for volumes that allow multi-attach
2020-03-06 13:17:21 -08:00
Kubernetes Prow Robot
179fe40d06
Merge pull request #88599 from julianvmodesto/scale-ctx-opts
Add context and options to scale client
2020-03-06 13:17:08 -08:00
Jordan Liggitt
04a72d5ef9 client-go metadata: update callers 2020-03-06 11:07:54 -05:00
Christian Huffman
c6fd25d100 Updated CSIDriver references 2020-03-06 08:21:26 -05:00
Cheng Xing
ef3d66b98b Parallelize attach operations across different nodes for volumes that allow multi-attach 2020-03-05 22:22:05 -08:00
Mike Danese
76f8594378 more artisanal fixes
Most of these could have been refactored automatically but it wouldn't
have been uglier. The unsophisticated tooling left lots of unnecessary
struct -> pointer -> struct transitions.
2020-03-05 14:59:47 -08:00
Mike Danese
aaf855c1e6 deref all calls to metav1.NewDeleteOptions that are passed to clients.
This is gross but because NewDeleteOptions is used by various parts of
storage that still pass around pointers, the return type can't be
changed without significant refactoring within the apiserver. I think
this would be good to cleanup, but I want to minimize apiserver side
changes as much as possible in the client signature refactor.
2020-03-05 14:59:46 -08:00