kubernetes/pkg
Kubernetes Submit Queue b0d024fee1 Merge pull request #45569 from vmware/fix_VolumesAreAttached
Automatic merge from submit-queue (batch tested with PRs 45569, 45602, 45604, 45478, 45550)

Fixing VolumesAreAttached and DisksAreAttached functions in vSphere

**What this PR does / why we need it**:

In the vSphere HA, when node fail over happens, node VM momentarily goes in to “not connected” state. During this time, if kubernetes calls VolumesAreAttached function, we are returning incorrect map, with status for volume set to false - detached state.

Volumes attached to previous nodes, requires to be detached before they can attach to the new node. Kubernetes attempt to check volume attachment. When node VM is not accessible or for any reason we cannot determine disk is attached, we were returning a Map of volumepath and its attachment status set to false. This was misinterpreted as disks are already detached from the node and Kubernetes was marking volumes as detached after orphaned pod is cleaned up. This causes volumes to remain attached to previous node, and pod creation always remains in the “containercreating” state. Since both the node are powered on, volumes can not be attached to new node.

**Logs before fix**

```
{"log":"E0508 21:31:20.902501       1 vsphere.go:1053] disk uuid not found for [vsanDatastore] kubevols/kubernetes-dynamic-pvc-8b75170e-342d-11e7-bab5-0050568aeb0a.vmdk. err: No disk UUID fou
nd\n","stream":"stderr","time":"2017-05-08T21:31:20.902792337Z"}
{"log":"E0508 21:31:20.902552       1 vsphere.go:1041] Failed to check whether disk is attached. err: No disk UUID found\n","stream":"stderr","time":"2017-05-08T21:31:20.902842673Z"}
{"log":"I0508 21:31:20.902575       1 attacher.go:114] VolumesAreAttached: check volume \"[vsanDatastore] kubevols/kubernetes-dynamic-pvc-8b75170e-342d-11e7-bab5-0050568aeb0a.vmdk\" (specName
: \"pvc-8b75170e-342d-11e7-bab5-0050568aeb0a\") is no longer attached\n","stream":"stderr","time":"2017-05-08T21:31:20.902849717Z"}
{"log":"I0508 21:31:20.902596       1 operation_generator.go:166] VerifyVolumesAreAttached determined volume \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-8b7
5170e-342d-11e7-bab5-0050568aeb0a.vmdk\" (spec.Name: \"pvc-8b75170e-342d-11e7-bab5-0050568aeb0a\") is no longer attached to node \"node3\", therefore it was marked as detached.\n","stream":"s
tderr","time":"2017-05-08T21:31:20.902863097Z"}
```



In this change, we are making sure correct volume attachment map is returned, and in case of any error occurred while checking disk’s status, we return nil map.


**Logs after fix**
```
{"log":"E0509 20:25:37.982152       1 vsphere.go:1067] Failed to check whether disk is attached. err: No disk UUID found\n","stream":"stderr","time":"2017-05-09T20:25:37.982516134Z"}
{"log":"E0509 20:25:37.982190       1 attacher.go:104] Error checking if volumes ([[vsanDatastore] kubevols/kubernetes-dynamic-pvc-c26fcae8-34f2-11e7-9303-0050568a3ac1.vmdk [vsanDatastore] kubevols/kubernetes-dynamic-pvc-c268f141-34f2-11e7-9303-0050568a3ac1.vmdk [vsanDatastore] kubevols/kubernetes-dynamic-pvc-c25d08d3-34f2-11e7-9303-0050568a3ac1.vmdk]) are attached to current node (\"node3\"). err=No disk UUID found\n","stream":"stderr","time":"2017-05-09T20:25:37.982521101Z"}
{"log":"E0509 20:25:37.982220       1 operation_generator.go:158] VolumesAreAttached failed for checking on node \"node3\" with: No disk UUID found\n","stream":"stderr","time":"2017-05-09T20:25:37.982526285Z"}
{"log":"I0509 20:25:39.157279       1 attacher.go:115] VolumesAreAttached: volume \"[vsanDatastore] kubevols/kubernetes-dynamic-pvc-c268f141-34f2-11e7-9303-0050568a3ac1.vmdk\" (specName: \"pvc-c268f141-34f2-11e7-9303-0050568a3ac1\") is attached\n","stream":"stderr","time":"2017-05-09T20:25:39.157724393Z"}
{"log":"I0509 20:25:39.157329       1 attacher.go:115] VolumesAreAttached: volume \"[vsanDatastore] kubevols/kubernetes-dynamic-pvc-c25d08d3-34f2-11e7-9303-0050568a3ac1.vmdk\" (specName: \"pvc-c25d08d3-34f2-11e7-9303-0050568a3ac1\") is attached\n","stream":"stderr","time":"2017-05-09T20:25:39.157787946Z"}
{"log":"I0509 20:25:39.157367       1 attacher.go:115] VolumesAreAttached: volume \"[vsanDatastore] kubevols/kubernetes-dynamic-pvc-c26fcae8-34f2-11e7-9303-0050568a3ac1.vmdk\" (specName: \"pvc-c26fcae8-34f2-11e7-9303-0050568a3ac1\") is attached\n","stream":"stderr","time":"2017-05-09T20:25:39.157794586Z"}
```

```
{"log":"I0509 20:25:41.267425       1 reconciler.go:173] Started DetachVolume for volume \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-c26fcae8-34f2-11e7-9303-0050568a3ac1.vmdk\" from node \"node3\"\n","stream":"stderr","time":"2017-05-09T20:25:41.267883567Z"}
{"log":"I0509 20:25:41.271836       1 operation_generator.go:694] Verified volume is safe to detach for volume \"pvc-c26fcae8-34f2-11e7-9303-0050568a3ac1\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-c26fcae8-34f2-11e7-9303-0050568a3ac1.vmdk\") on node \"node3\" \n","stream":"stderr","time":"2017-05-09T20:25:41.272703255Z"}
{"log":"I0509 20:25:47.928021       1 operation_generator.go:341] DetachVolume.Detach succeeded for volume \"pvc-c26fcae8-34f2-11e7-9303-0050568a3ac1\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-c26fcae8-34f2-11e7-9303-0050568a3ac1.vmdk\") on node \"node3\" \n","stream":"stderr","time":"2017-05-09T20:25:47.928348553Z"}

{"log":"I0509 20:26:12.535962       1 operation_generator.go:694] Verified volume is safe to detach for volume \"pvc-c25d08d3-34f2-11e7-9303-0050568a3ac1\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-c25d08d3-34f2-11e7-9303-0050568a3ac1.vmdk\") on node \"node3\" \n","stream":"stderr","time":"2017-05-09T20:26:12.536055214Z"}
{"log":"I0509 20:26:14.188580       1 operation_generator.go:341] DetachVolume.Detach succeeded for volume \"pvc-c25d08d3-34f2-11e7-9303-0050568a3ac1\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-c25d08d3-34f2-11e7-9303-0050568a3ac1.vmdk\") on node \"node3\" \n","stream":"stderr","time":"2017-05-09T20:26:14.188792677Z"}

{"log":"I0509 20:26:40.355656       1 reconciler.go:173] Started DetachVolume for volume \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-c268f141-34f2-11e7-9303-0050568a3ac1.vmdk\" from node \"node3\"\n","stream":"stderr","time":"2017-05-09T20:26:40.355922165Z"}
{"log":"I0509 20:26:40.357988       1 operation_generator.go:694] Verified volume is safe to detach for volume \"pvc-c268f141-34f2-11e7-9303-0050568a3ac1\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-c268f141-34f2-11e7-9303-0050568a3ac1.vmdk\") on node \"node3\" \n","stream":"stderr","time":"2017-05-09T20:26:40.358177953Z"}

```




**Which issue this PR fixes**
fixes #45464, https://github.com/vmware/kubernetes/issues/116

**Special notes for your reviewer**:
Verified this change on locally built hyperkube image - v1.7.0-alpha.3.147+3c0526cb64bdf5-dirty

**performed many fail over with large volumes (30GB) attached to the pod.**

$ kubectl describe pod
Name:		wordpress-mysql-2789807967-3xcvc
Node:		node3/172.1.87.0
Status:		Running

Powered Off node3's host. pod failed over to node2. Verified all 3 disks detached from node3 and attached to node2.

$ kubectl describe pod
Name:		wordpress-mysql-2789807967-qx0b0
Node:		node2/172.1.9.0
Status:		Running

Powered Off node2's host. pod failed over to node3. Verified all 3 disks detached from node2 and attached to node3.

$ kubectl describe pod
Name:		wordpress-mysql-2789807967-7849s
Node:		node3/172.1.87.0
Status:		Running

Powered Off node3's host. pod failed over to node1. Verified all 3 disks detached from node3 and attached to node1.

$ kubectl describe pod
Name:		wordpress-mysql-2789807967-26lp1
Node:		node1/172.1.98.0
Status:		Running

Powered off node1's host. pod failed over to node3. Verified all 3 disks detached from node1 and attached to node3.

$ kubectl describe pods
Name:		wordpress-mysql-2789807967-4pdtl
Node:		node3/172.1.87.0
Status:		Running


Powered off node3's host. pod failed over to node1. Verified all 3 disks detached from node3 and attached to node1.

$ kubectl describe pod
Name:		wordpress-mysql-2789807967-t375f
Node:		node1/172.1.98.0
Status:		Running

Powered off node1's host. pod failed over to node3. Verified all 3 disks detached from node1 and attached to node3.

$ kubectl describe pods
Name:		wordpress-mysql-2789807967-pn6ps
Node:		node3/172.1.87.0
Status:		Running

powered off node3's host. pod failed over to node1. Verified all 3 disks detached from node3 and attached to node1

$ kubectl describe pods
Name:		wordpress-mysql-2789807967-0wqc1
Node:		node1/172.1.98.0
Status:		Running

powered off node1's host. pod failed over to node3. Verified all 3 disks detached from node1 and attached to node3.

$ kubectl describe pods
Name:		wordpress-mysql-2789807967-821nc
Node:		node3/172.1.87.0
Status:		Running


**Release note**:

```release-note
NONE
```

CC:  @BaluDontu @abrarshivani @luomiao @tusharnt @pdhamdhere
2017-05-10 21:34:37 -07:00
..
api Merge pull request #39713 from k82cn/init_container_defaults 2017-05-06 23:03:48 -07:00
apimachinery/tests autogenerated 2017-04-14 10:40:57 -07:00
apis Remove the deprecated --enable-cri flag 2017-05-10 13:03:41 -07:00
auth autogenerated 2017-04-14 10:40:57 -07:00
bootstrap/api autogenerated 2017-04-14 10:40:57 -07:00
capabilities Fix comment for method SetForTests 2017-02-14 17:16:49 +08:00
client generated codes. 2017-05-10 01:50:38 +08:00
cloudprovider Merge pull request #45569 from vmware/fix_VolumesAreAttached 2017-05-10 21:34:37 -07:00
controller Merge pull request #45304 from deads2k/controller-03-ns-discovery 2017-05-09 12:04:41 -07:00
conversion Revert "Remove conversion package" 2017-01-22 15:41:06 -08:00
credentialprovider Merge pull request #45056 from ericchiang/update-oauth2 2017-05-03 19:34:14 -07:00
features autogenerated 2017-04-14 10:40:57 -07:00
fieldpath autogenerated 2017-04-14 10:40:57 -07:00
fields move pkg/fields to apimachinery 2017-01-19 09:50:16 -05:00
generated move metrics to staging 2017-05-01 16:43:50 -07:00
hyperkube Enable auto-generating sources rules 2017-01-05 14:14:13 -08:00
kubeapiserver Merge pull request #44196 from xiangpengzhao/cmd-cleanup 2017-04-28 21:28:09 -07:00
kubectl Merge pull request #44746 from xiangpengzhao/fix-podpreset 2017-05-09 21:16:17 -07:00
kubelet Merge pull request #45194 from yujuhong/rm-cri-flag 2017-05-10 20:46:24 -07:00
kubemark Update bazel BUID files 2017-05-05 11:48:08 -07:00
labels add back just enough empty packages to allow heapster cycles to succeed 2017-01-17 08:07:30 -05:00
master Merge pull request #44968 from MrHohn/kube-proxy-healthcheck 2017-05-08 14:54:38 -07:00
metrics autogenerated 2017-04-14 10:40:57 -07:00
printers Merge pull request #43067 from xilabao/dedup-in-printer 2017-05-10 19:08:59 -07:00
probe fix various bad tests 2017-04-25 11:23:33 -07:00
proxy Remove no-longer used code in proxy/config 2017-05-10 12:16:35 +02:00
quota autogenerated 2017-04-14 10:40:57 -07:00
registry Add sts alias for kubectl statefulset 2017-05-10 09:57:36 -04:00
routes Merge pull request #45490 from deads2k/owners-01-extensions 2017-05-10 12:51:51 -07:00
runtime add back just enough empty packages to allow heapster cycles to succeed 2017-01-17 08:07:30 -05:00
security Use dedicated Unix User and Group ID types 2017-05-05 14:07:38 +02:00
securitycontext Use dedicated Unix User and Group ID types 2017-05-05 14:07:38 +02:00
serviceaccount autogenerated 2017-04-14 10:40:57 -07:00
ssh autogenerated 2017-04-14 10:40:57 -07:00
types add back just enough empty packages to allow heapster cycles to succeed 2017-01-17 08:07:30 -05:00
util Remove leaked tmp file in unit tests 2017-05-08 18:07:02 +08:00
version autogenerated 2017-04-14 10:40:57 -07:00
volume fix implementation of VolumesAreAttached function 2017-05-10 10:16:13 -07:00
watch autogenerated 2017-04-14 10:40:57 -07:00
BUILD Regenerate everything 2017-03-02 08:56:26 +01:00
OWNERS Updated top level owners file to match new format 2017-01-19 11:29:16 -08:00