kubernetes/pkg/controller
Nikhita Raghunath 4dd99967bd pkg/controller/job: re-honor exponential backoff
This commit makes the job controller re-honor exponential backoff for
failed pods. Before this commit, the controller created pods without any
backoff. This is a regression because the controller used to
create pods with an exponential backoff delay before (10s, 20s, 40s ...).

The issue occurs only when the JobTrackingWithFinalizers feature is
enabled (which is enabled by default right now). With this feature, we
get an extra pod update event when the finalizer of a failed pod is
removed.

Note that the pod failure detection and new pod creation happen in the
same reconcile loop so the 2nd pod is created immediately after the 1st
pod fails. The backoff is only applied on 2nd pod failure, which means
that the 3rd pod created 10s after the 2nd pod, 4th pod is created 20s
after the 3rd pod and so on.

This commit fixes a few bugs:

1. Right now, each time `uncounted != nil` and the job does not see a
_new_ failure, `forget` is set to true and the job is removed from the
queue. Which means that this condition is also triggered each time the
finalizer for a failed pod is removed and `NumRequeues` is reset, which
results in a backoff of 0s.

2. Updates `updatePod` to only apply backoff when we see a particular
pod failed for the first time. This is necessary to ensure that the
controller does not apply backoff when it sees a pod update event
for finalizer removal of a failed pod.

3. If `JobsReadyPods` feature is enabled and backoff is 0s, the job is
now enqueued after `podUpdateBatchPeriod` seconds, instead of 0s. The
unit test for this check also had a few bugs:
    - `DefaultJobBackOff` is overwritten to 0 in certain unit tests,
    which meant that `DefaultJobBackOff` was considered to be 0,
    effectively not running any meaningful checks.
    - `JobsReadyPods` was not enabled for test cases that ran tests
    which required the feature gate to be enabled.
    - The check for expected and actual backoff had incorrect
    calculations.
2023-01-12 20:52:53 +05:30
..
apis/config generated: Run hack/update-gofmt.sh 2021-08-24 15:47:49 -04:00
bootstrap Merge pull request #105510 from damemi/wire-contexts-bootstrap 2021-11-02 14:27:42 -07:00
certificates Generate and format files 2022-12-20 17:26:07 -05:00
clusterroleaggregation Wire contexts to RBAC controllers 2021-10-07 15:04:49 -04:00
cronjob Generate and format files 2022-12-20 17:26:07 -05:00
daemon Generate and format files 2022-12-20 17:26:07 -05:00
deployment Generate and format files 2022-12-20 17:26:07 -05:00
disruption Generate and format files 2022-12-20 17:26:07 -05:00
endpoint Merge pull request #108879 from robscott/automated-cherry-pick-of-#108078-upstream-release-1.23 2022-06-12 05:06:09 -07:00
endpointslice Generate and format files 2022-12-20 17:26:07 -05:00
endpointslicemirroring Fixing how EndpointSlice Mirroring handles Service selector transitions 2021-10-29 11:03:28 -07:00
garbagecollector ResettableRESTMapper to make it possible to reset wrapped mappers 2021-11-06 10:44:02 +11:00
history fix: 81134: fix unsafe json for ReleaseControllerRevision (#104049) 2021-11-05 06:33:52 -07:00
job pkg/controller/job: re-honor exponential backoff 2023-01-12 20:52:53 +05:30
namespace Generate and format files 2022-12-20 17:26:07 -05:00
nodeipam Generate and format files 2022-12-20 17:26:07 -05:00
nodelifecycle Generate and format files 2022-12-20 17:26:07 -05:00
podautoscaler Generate and format files 2022-12-20 17:26:07 -05:00
podgc Wire contexts to Core controllers 2021-11-01 10:29:00 -04:00
replicaset Wire contexts to Batch controllers (#105491) 2021-11-10 14:56:46 -08:00
replication Wire contexts to Batch controllers (#105491) 2021-11-10 14:56:46 -08:00
resourcequota Implement controller and kubelet changes for recovery from resize 2021-11-16 11:06:46 -05:00
serviceaccount Merge pull request #102945 from chenchun/fake 2021-11-02 07:14:58 -07:00
statefulset Merge pull request #112084 from gjkim42/automated-cherry-pick-of-#109694-upstream-release-1.23 2023-01-11 14:00:07 -08:00
storageversiongc Wire contexts to Core controllers 2021-11-01 10:29:00 -04:00
testutil NodeLifecycleController: Remove race condition 2022-10-25 14:09:02 +00:00
ttl Wire contexts to Core controllers 2021-11-01 10:29:00 -04:00
ttlafterfinished Wire contexts to Core controllers 2021-11-01 10:29:00 -04:00
util Generate and format files 2022-12-20 17:26:07 -05:00
volume Generate and format files 2022-12-20 17:26:07 -05:00
controller_ref_manager.go Generate and format files 2022-12-20 17:26:07 -05:00
controller_ref_manager_test.go Merge pull request #101250 from evertrain/master 2021-11-10 09:19:26 -08:00
controller_utils.go Generate and format files 2022-12-20 17:26:07 -05:00
controller_utils_test.go NodeLifecycleController: Remove race condition 2022-10-25 14:09:02 +00:00
doc.go
lookup_cache.go
OWNERS add myself as reviewer in pkg/controller/OWNERS 2021-01-15 17:21:35 -05:00