Commit graph

6128 commits

Author SHA1 Message Date
Daniel Henkel
0e0f1135f2
keep existing PDB conditions when updating status
When the disruption controller updates the PDB status, it removes all conditions from the new status object and then re-adds the sufficient pods condition. Unfortunately, this behavior removes conditions set by other controllers, leading to multiple consecutive updates.
Therefore, this commit ensures that conditions are preserved during updates.
2024-03-07 09:20:24 +01:00
carlory
9fdcdd7a7d If a pvc has an empty storageclass name, don't try to assign a default StorageClass to it. 2024-01-19 18:25:39 +08:00
Aleksandra Malinowska
f7e235c031 Make StatefulSet restart pods with phase Succeeded 2023-11-16 11:26:42 +01:00
Maciej Szulik
43289de650
Add more test cases ensuring nextScheduleTimeDuration is never < 0 2023-10-26 12:38:26 +02:00
Maciej Szulik
f9c859853d
Modify mostRecentScheduleTime to return more detailed information about missed schedules
Initially this method was returning a number of missed schedules, but
that turned out to be not reliable for some complex schedules. For
example, those which are being run only during week days. The second
approach was to only return a boolean indicating the too many missed
information. It turns out that we need to return all three values:
none missed, few missed and many missed, to let consumers know what to
do, but don't leak the wrong number out of mostRecentScheduleTime.
2023-10-26 12:38:26 +02:00
Maciej Szulik
b59bb790a4
Fix spelling 2023-10-26 12:38:25 +02:00
Kubernetes Prow Robot
da5caf82c1
Merge pull request #121396 from Nordix/automated-cherry-pick-of-#120731-upstream-release-1.27
Automated cherry pick of #120731: Fixing CurrentReplicas and CurrentRevision in
2023-10-23 15:12:26 +02:00
adil ghaffar
3bff908409
Fixing CurrentReplicas and CurrentRevision in completeRollingUpdate 2023-10-20 17:45:52 +03:00
Michal Wozniak
8ba9887456 Use Patch instead of SSA for Pod Disruption condition 2023-10-20 11:12:49 +02:00
Aleksandra Malinowska
e5e70a19c4 Fix concurrent write when filling PVC labels 2023-10-12 17:41:00 +02:00
Aleksandra Malinowska
53a6d4c6d4 Modify test PVC to detect concurrent map write bug 2023-10-12 17:41:00 +02:00
Kubernetes Prow Robot
3be02e15bb
Merge pull request #121080 from jsafrane/automated-cherry-pick-of-#120595-upstream-release-1.27
Automated cherry pick of #120595: Mark a volume as uncertain-attached after detach error
2023-10-12 12:25:25 +02:00
Jan Safranek
3c3956f81a Mark a volume as uncertain-attached after detach error
Volume that failed Detach() should not be marked as attached, CSI
external-attacher is probably still trying to detach it.

Mark it uncertain instead and wait for Detach() to succeed.
2023-10-09 18:49:53 +02:00
Lukasz Stankiewicz
ca37df6c46 Add nil checks for hpa object target type values 2023-10-06 13:26:29 -07:00
Kubernetes Prow Robot
6fecd57fb9
Merge pull request #120810 from andrewsykim/automated-cherry-pick-of-#120649-origin-release-1.27
Automated cherry pick of #120649: cronjob controller: ensure already existing jobs are added to
2023-09-28 03:54:34 -07:00
Kubernetes Prow Robot
bdfb880a19
Merge pull request #120786 from mochizuki875/automated-cherry-pick-of-#119317-upstream-release-1.27
Automated cherry pick of #119317: change rolling update logic to exclude sunsetting nodes
2023-09-22 01:20:59 -07:00
Andrew Sy Kim
af5640af4e cronjob controller: ensure already existing jobs are added to Active list of cronjobs
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
2023-09-21 14:58:31 +00:00
mochizuki875
602be90a0c change rolling update logic to exclude sunsetting nodes 2023-09-21 03:07:33 +00:00
Aldo Culquicondor
e5ea6cdfce Increase range of job_sync_duration_seconds
Change-Id: I7ed4b006faecf0a7e6e583c42b4d6bc4b786a164
2023-09-14 14:17:50 +02:00
Kubernetes Prow Robot
5a39f37d47
Merge pull request #120323 from Miciah/automated-cherry-pick-of-#118189-origin-release-1.27
Automated cherry pick of #118189: TopologyAwareHints: Take lock in HasPopulatedHints
2023-09-07 20:22:12 -07:00
Albert Sverdlov
c0d2ca7bb6
Automated cherry pick of #119776: Fix a job quota related deadlock (#120320)
* Fix a job quota related deadlock

In case ResourceQuota is used and sets a max # of jobs, a CronJob may get
trapped in a deadlock:
  1. Job quota for a namespace is reached.
  2. CronJob controller can't create a new job, because quota is
     reached.
  3. Cleanup of jobs owned by a cronjob doesn't happen, because a
     control loop iteration is finished because of an error to create a
     job.

To fix this we stop early quitting from a control loop iteration when
cronjob reconciliation failed and always let old jobs to be cleaned up.

* Dont reorder imports

* Don't stop requeuing on reconciliation error

Previous code only logged the reconciliation error inside jm.sync() and
didn't return the reconciliation error to it's invoker
processNextWorkItem().

Adding a copy-paste back to avoid this issue.

* Remove copy-pasted cleanupFinishedJobs()

Now we always call jm.cleanupFinishedJobs() first and then
jm.syncCronJob().

We also extract cronJobCopy and updateStatus outside jm.syncCronJob
function and pass pointers to them in both jm.syncCronJob and
jm.cleanupFinishedJobs to make delayed updates handling more explicit
and not dependent on the order in which cleanupFinishedJobs and
syncCronJob are invoked.

* Return updateStatus bool instead of changing the reference

* Explicitly ignore err in tests to fix linter

* Fix formatting with update-gofmt.sh
2023-09-03 23:53:49 -07:00
Miciah Masters
fc18ffe58d TopologyAwareHints: Take lock in HasPopulatedHints
Prevent potential concurrent map access by taking a lock before reading the
topology cache's hintsPopulatedByService map.

* staging/src/k8s.io/endpointslice/topologycache/topologycache.go
(setHintsLocked, hasPopulatedHintsLocked): New helper functions.  These are
the same as the existing SetHints and HasPopulatedHints methods except that
these helpers assume that a lock is already held.
(SetHints): Use setHintsLocked.
(HasPopulatedHints): Take a lock and use hasPopulatedHintsLocked.
(AddHints): Take a lock and use setHintsLocked and hasPopulatedHintsLocked.
* staging/src/k8s.io/endpointslice/topologycache/topologycache_test.go
(TestTopologyCacheRace): Add a goroutine that calls HasPopulatedHints.
2023-08-31 16:38:51 -04:00
Kubernetes Prow Robot
de56018f04
Merge pull request #117269 from tnqn/automated-cherry-pick-of-#117245-#117249-upstream-release-1.27
Automated cherry pick of #117245: Fix TopologyAwareHint not working when zone label is added
#117249: Fix a data race in TopologyCache
2023-08-04 13:26:31 -07:00
Michal Wozniak
ed0cdc9e0b Include ignored pods when computing backoff delay for Job pod failures
# Conflicts:
#	pkg/controller/job/job_controller.go
2023-07-21 09:31:49 +02:00
Michal Wozniak
ae24a5cf74 Remarks 2023-07-21 09:29:47 +02:00
Michal Wozniak
9e1050b4d9 Adjust the algorithm for computing the pod finish time
Change-Id: Ic282a57169cab8dc498574f08b081914218a1039
2023-07-20 16:29:26 +02:00
Kubernetes Prow Robot
5ee5d7346e
Merge pull request #119096 from aleksandra-malinowska/automated-cherry-pick-of-#117865-upstream-release-1.27
Automated cherry pick of #117865: Parallel StatefulSet pod create & delete
2023-07-12 16:31:33 -07:00
Aleksandra Malinowska
28c79be674 Add unit tests for parallel StatefulSet create & delete 2023-07-10 12:31:07 +02:00
Aleksandra Malinowska
66f980be12 Parallel StatefulSet pod create & delete 2023-07-10 12:31:07 +02:00
Aleksandra Malinowska
288504fbf8 Refactor StatefulSet controller update logic 2023-07-10 12:31:07 +02:00
Aldo Culquicondor
92a0f58e2b
Only declare job as finished after removing all finalizers
Change-Id: Id4b01b0e6fabe24134e57e687356e0fc613cead4
2023-07-07 14:31:02 -04:00
Aldo Culquicondor
c655001fa4
Automated cherry pick of #118716 upstream release 1.27 (#118911)
* Skip terminal Pods with a deletion timestamp from the Daemonset sync

Change-Id: I64a347a87c02ee2bd48be10e6fff380c8c81f742

* Review comments and fix integration test

Change-Id: I3eb5ec62bce8b4b150726a1e9b2b517c4e993713

* Include deleted terminal pods in history

Change-Id: I8b921157e6be1c809dd59f8035ec259ea4d96301

* Exclude terminal pods from Daemonset e2e tests

Change-Id: Ic29ca1739ebdc54822d1751fcd56a99c628021c4
2023-07-06 18:57:02 -07:00
Maciej Szulik
b383755e46 Hide numberOfMissedSchedules as an algorithm internal number 2023-07-06 10:21:55 -07:00
Maciej Szulik
26db84e04c Update schedule logic to properly calculate missed schedules
Before this change we've assumed a constant time between schedule runs,
which is not true for cases like "30 6-16/4 * * 1-5".
The fix is to calculate the potential next run using the fixed schedule
as the baseline, and then go back one schedule back and allow the cron
library to calculate the correct time.

This approach saves us from iterating multiple times between last
schedule time and now, if the cronjob for any reason wasn't running for
significant amount of time.
2023-07-06 10:21:43 -07:00
Paco Xu
9ef90afb4f verifyVolumeNoStatusUpdateNeeded may cause flake and so only keep the last ones 2023-04-18 11:30:37 +02:00
Paco Xu
b598ea5c39 deflake: Add retry with timeout to wait for final conditions 2023-04-18 11:30:37 +02:00
Quan Tian
6f8ce72c0c Fix a data race in TopologyCache
The member variable `cpuRatiosByZone` should be accessed with the lock
acquired as it could be be updated by `SetNodes` concurrently.

Signed-off-by: Quan Tian <qtian@vmware.com>
Co-authored-by: Antonio Ojea <aojea@google.com>
2023-04-13 11:13:02 +08:00
Quan Tian
668778d1bd Fix TopologyAwareHint not working when zone label is added after Node creation
The topology.kubernetes.io/zone label may be added by could provider
asynchronously after the Node is created. The previous code didn't
update the topology cache after receiving the Node update event, causing
TopologyAwareHint to not work until kube-controller-manager restarts or
other Node events trigger the update.

Signed-off-by: Quan Tian <qtian@vmware.com>
2023-04-13 11:13:00 +08:00
Harshal Patil
1972dd1005 Do not log entire pod struct while attaching the volume
Signed-off-by: Harshal Patil <harpatil@redhat.com>
2023-04-05 20:24:12 -04:00
Michal Wozniak
b5dd5f1f3a Investigate and fix the handling of Succeeded pods in DaemonSet 2023-04-04 19:21:15 +02:00
mantuliu
0567c93b2a Improve the performance of map usage
Signed-off-by: mantuliu <240951888@qq.com>
2023-03-21 20:37:53 +08:00
Sathyanarayanan Saravanamuthu
c84c8add70
Decouple batch/job back-off logic from workqueues (#114768)
* batch/job: decouple backoff from workqueue

Signed-off-by: Sathyanarayanan Saravanamuthu <sathyanarays@vmware.com>

* Resolving review comments

* Resolving more review comments

* Resolving review comments

Signed-off-by: Sathyanarayanan Saravanamuthu <sathyanarays@vmware.com>

* Computing finish time to now when FinishedAt is unix epoch

* Addressing review comments

Signed-off-by: Sathyanarayanan Saravanamuthu <sathyanarays@vmware.com>

---------

Signed-off-by: Sathyanarayanan Saravanamuthu <sathyanarays@vmware.com>
2023-03-16 10:15:21 -07:00
Kensei Nakada
543f15d10c HPA: expose the metrics "metric_computation_duration_seconds" and "metric_computation_total" from HPA controller 2023-03-14 22:47:24 +00:00
Kubernetes Prow Robot
27e23bad7d
Merge pull request #116529 from pohly/controllers-with-name
kube-controller-manager: convert to structured logging
2023-03-14 14:12:55 -07:00
Kubernetes Prow Robot
c0ef73222f
Merge pull request #116522 from robscott/topology-1-27-updates
Introducing Topology Mode Annotation, Deprecating Topology Hints Annotation
2023-03-14 14:12:48 -07:00
Ziqi Zhao
d1aa73312c
pkg/controller/util support contextual logging (#115049)
Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
2023-03-14 12:38:14 -07:00
Patrick Ohly
99151c39b7 kube-controller-manager: convert to structured logging
Most of the individual controllers were already converted earlier. Some log
calls were missed or added and then not updated during a rebase. Some of those
get updated here to fill those gaps.

Adding of the name to the logger used by each controller gets
consolidated in this commit. By using the name under which the
controller is registered we ensure that the names in the log
are consistent.
2023-03-14 19:16:32 +01:00
Kubernetes Prow Robot
6a111bebe2
Merge pull request #116377 from kinvolk/rata/userns
KEP-127: user namespace support for stateless pods
2023-03-14 10:40:43 -07:00
Kubernetes Prow Robot
49649c89ea
Merge pull request #113584 from yangjunmyfm192085/volume-contextual-logging
volume: use contextual logging
2023-03-14 10:40:16 -07:00
Kensei Nakada
b49b34c03a
HPA: expose the metrics "reconciliations_total" and "reconciliation_duration_seconds" from HPA controller (#116010) 2023-03-14 09:39:42 -07:00