![見出し画像](https://assets.st-note.com/production/uploads/images/67284343/rectangle_large_type_2_a72b0e576d87f2cfee17a5ce251dfe5b.png?width=1200)
Podが"Pending"のまま起動しない原因(kubernetes)
LinuCエヴァンジェリストの鯨井貴博@opensourcetechです。
はじめに
KubernetesでPodが"Pending"のまま起動しない原因に関するメモです。
事象の発生
以下のように、Deployment(Pod)を含むをapplyします。
kubeuser@kubemaster1:~$ kubectl apply -f nginx.yaml
deployment.apps/nginx created
kubeuser@kubemaster1:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-6c67f5ff6f-nrdvt 0/1 Pending 0 7s
nginx-6c67f5ff6f-t4sg8 0/1 Pending 0 7s
kubeuser@kubemaster1:~$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-6c67f5ff6f-nrdvt 0/1 Pending 0 13s <none> <none> <none> <none>
nginx-6c67f5ff6f-t4sg8 0/1 Pending 0 13s <none> <none> <none> <none>
.
.
.
kubeuser@kubemaster1:~$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-6c67f5ff6f-nrdvt 0/1 Pending 0 77s <none> <none> <none> <none>
nginx-6c67f5ff6f-t4sg8 0/1 Pending 0 77s <none> <none> <none> <none>
STATUSが"Pending"のまな起動してくる気配がありません。
Podが起動しない原因
Podが起動しない原因は、Worker Nodeにあります。
kubeuser@kubemaster1:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubemaster1 Ready master 6d1h v1.18.0
kubemaster2 Ready master 6d1h v1.18.0
kubemaster3 Ready master 6d v1.18.0
kubeworker Ready <none> 6d1h v1.18.0
kubeworker2 Ready <none> 6d v1.18.0
STATUS "Ready"となっており特に異常はないようですが、
実は、Worker Nodeではkubeletが停止しています。
kubernetesのクラスターでは、以下のようにMaster NodeのAPI-ServerとWorker Nodeのkubeletが通信をしていますが、kubeletが停止しているとこれが出来ずPodの配置(スケジューリング)が出来なくなることが原因です。
![](https://assets.st-note.com/img/1638884424418-1EjYhJ8ZR0.png?width=1200)
https://kubernetes.io/ja/docs/concepts/overview/components/
Worker Node(1台目)。
kubeuser@kubeworker:~$ sudo systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor prese>
Drop-In: /etc/systemd/system/kubelet.service.d
mq10-kubeadm.conf
Active: inactive (dead) since Tue 2021-12-07 13:12:25 UTC; 3s ago
Docs: https://kubernetes.io/docs/home/
Process: 19532 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET>
Main PID: 19532 (code=exited, status=0/SUCCESS)
Dec 07 10:52:30 kubeworker kubelet[19532]: E1207 10:52:30.866416 19532 remote>
Dec 07 11:36:48 kubeworker kubelet[19532]: E1207 11:36:48.594091 19532 contro>
Dec 07 11:36:55 kubeworker kubelet[19532]: E1207 11:36:55.696722 19532 contro>
Dec 07 11:36:55 kubeworker kubelet[19532]: E1207 11:36:55.875900 19532 contro>
Dec 07 11:36:55 kubeworker kubelet[19532]: E1207 11:36:55.876512 19532 contro>
Dec 07 11:36:55 kubeworker kubelet[19532]: E1207 11:36:55.876845 19532 contro>
Dec 07 11:36:56 kubeworker kubelet[19532]: I1207 11:36:55.877071 19532 contro>
Dec 07 13:12:25 kubeworker systemd[1]: Stopping kubelet: The Kubernetes Node Ag>
Dec 07 13:12:25 kubeworker systemd[1]: kubelet.service: Succeeded.
Dec 07 13:12:25 kubeworker systemd[1]: Stopped kubelet: The Kubernetes Node Age>
Worker Node(2台目)。
kubeuser@kubeworker2:~$ sudo systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor prese>
Drop-In: /etc/systemd/system/kubelet.service.d
mq10-kubeadm.conf
Active: inactive (dead) since Tue 2021-12-07 13:12:56 UTC; 1s ago
Docs: https://kubernetes.io/docs/home/
Process: 661 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_C>
Main PID: 661 (code=exited, status=0/SUCCESS)
Dec 07 13:03:15 kubeworker2 kubelet[3396]: 2021-12-07 13:03:15.293 [INFO][3410]>
Dec 07 13:03:15 kubeworker2 kubelet[3396]: time="2021-12-07T13:03:15Z" level=in>
Dec 07 13:03:15 kubeworker2 kubelet[3396]: 2021-12-07 13:03:15.318 [INFO][3396]>
Dec 07 13:03:38 kubeworker2 kubelet[661]: E1207 13:03:38.903002 661 control>
Dec 07 13:03:38 kubeworker2 kubelet[661]: E1207 13:03:38.952191 661 kubelet>
Dec 07 13:03:42 kubeworker2 kubelet[661]: E1207 13:03:42.313131 661 control>
Dec 07 13:09:25 kubeworker2 kubelet[661]: E1207 13:09:25.525803 661 kubelet>
Dec 07 13:12:56 kubeworker2 systemd[1]: Stopping kubelet: The Kubernetes Node A>
Dec 07 13:12:56 kubeworker2 systemd[1]: kubelet.service: Succeeded.
Dec 07 13:12:56 kubeworker2 systemd[1]: Stopped kubelet: The Kubernetes Node Ag>
kubectl describe nodesでみても、"Kubelet stopped posting node status."と出てますね。
kubeuser@kubemaster1:~$ kubectl describe nodes kubeworker2
Name: kubeworker2
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=kubeworker2
kubernetes.io/os=linux
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
projectcalico.org/IPv4Address: 192.168.1.254/24
projectcalico.org/IPv4IPIPTunnelAddr: 10.0.225.0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 01 Dec 2021 12:23:46 +0000
Taints: node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unreachable:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: kubeworker2
AcquireTime: <unset>
RenewTime: Tue, 07 Dec 2021 13:43:03 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Tue, 07 Dec 2021 13:03:14 +0000 Tue, 07 Dec 2021 13:03:14 +0000 CalicoIsUp Calico is running on this node
MemoryPressure Unknown Tue, 07 Dec 2021 13:43:04 +0000 Tue, 07 Dec 2021 13:43:47 +0000 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Tue, 07 Dec 2021 13:43:04 +0000 Tue, 07 Dec 2021 13:43:47 +0000 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure Unknown Tue, 07 Dec 2021 13:43:04 +0000 Tue, 07 Dec 2021 13:43:47 +0000 NodeStatusUnknown Kubelet stopped posting node status.
Ready Unknown Tue, 07 Dec 2021 13:43:04 +0000 Tue, 07 Dec 2021 13:43:47 +0000 NodeStatusUnknown Kubelet stopped posting node status.
Addresses:
InternalIP: 192.168.1.254
Hostname: kubeworker2
Capacity:
cpu: 2
ephemeral-storage: 20511312Ki
hugepages-2Mi: 0
memory: 2035140Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 18903225108
hugepages-2Mi: 0
memory: 1932740Ki
pods: 110
System Info:
Machine ID: 7c474b3b662c452a98ea24d02d1871e9
System UUID: 7c474b3b-662c-452a-98ea-24d02d1871e9
Boot ID: d487a166-68fa-4da9-b89d-45cefb6bddc1
Kernel Version: 5.4.0-91-generic
OS Image: Ubuntu 20.04.3 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://20.10.7
Kubelet Version: v1.18.0
Kube-Proxy Version: v1.18.0
PodCIDR: 10.0.1.0/24
PodCIDRs: 10.0.1.0/24
Non-terminated Pods: (2 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system calico-node-lk26d 250m (12%) 0 (0%) 0 (0%) 0 (0%) 6d1h
kube-system kube-proxy-xv78r 0 (0%) 0 (0%) 0 (0%) 0 (0%) 6d1h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 250m (12%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 44m kubelet, kubeworker2 Starting kubelet.
Warning ImageGCFailed 44m kubelet, kubeworker2 failed to get imageFs info: unable to find data in memory cache
Normal NodeAllocatableEnforced 44m kubelet, kubeworker2 Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 44m (x2 over 44m) kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 44m (x2 over 44m) kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 44m (x2 over 44m) kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasSufficientPID
Warning Rebooted 44m kubelet, kubeworker2 Node kubeworker2 has been rebooted, boot id: d487a166-68fa-4da9-b89d-45cefb6bddc1
Normal NodeReady 44m kubelet, kubeworker2 Node kubeworker2 status is now: NodeReady
Normal Starting 43m kube-proxy, kubeworker2 Starting kube-proxy.
Normal Starting 115s kubelet, kubeworker2 Starting kubelet.
Normal NodeHasSufficientMemory 115s kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasSufficientMemory
Normal NodeHasSufficientPID 115s kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 115s kubelet, kubeworker2 Updated Node Allocatable limit across pods
Normal NodeHasNoDiskPressure 99s (x2 over 115s) kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasNoDiskPressure
参考のため、以下は正常時のもの。
kubeuser@kubemaster1:~$ kubectl describe nodes kubeworker2
Name: kubeworker2
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=kubeworker2
kubernetes.io/os=linux
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
projectcalico.org/IPv4Address: 192.168.1.254/24
projectcalico.org/IPv4IPIPTunnelAddr: 10.0.225.0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 01 Dec 2021 12:23:46 +0000
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: kubeworker2
AcquireTime: <unset>
RenewTime: Tue, 07 Dec 2021 13:43:03 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Tue, 07 Dec 2021 13:03:14 +0000 Tue, 07 Dec 2021 13:03:14 +0000 CalicoIsUp Calico is running on this node
MemoryPressure False Tue, 07 Dec 2021 13:43:04 +0000 Tue, 07 Dec 2021 13:43:04 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 07 Dec 2021 13:43:04 +0000 Tue, 07 Dec 2021 13:43:04 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 07 Dec 2021 13:43:04 +0000 Tue, 07 Dec 2021 13:43:04 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 07 Dec 2021 13:43:04 +0000 Tue, 07 Dec 2021 13:43:04 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 192.168.1.254
Hostname: kubeworker2
Capacity:
cpu: 2
ephemeral-storage: 20511312Ki
hugepages-2Mi: 0
memory: 2035140Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 18903225108
hugepages-2Mi: 0
memory: 1932740Ki
pods: 110
System Info:
Machine ID: 7c474b3b662c452a98ea24d02d1871e9
System UUID: 7c474b3b-662c-452a-98ea-24d02d1871e9
Boot ID: d487a166-68fa-4da9-b89d-45cefb6bddc1
Kernel Version: 5.4.0-91-generic
OS Image: Ubuntu 20.04.3 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://20.10.7
Kubelet Version: v1.18.0
Kube-Proxy Version: v1.18.0
PodCIDR: 10.0.1.0/24
PodCIDRs: 10.0.1.0/24
Non-terminated Pods: (2 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system calico-node-lk26d 250m (12%) 0 (0%) 0 (0%) 0 (0%) 6d1h
kube-system kube-proxy-xv78r 0 (0%) 0 (0%) 0 (0%) 0 (0%) 6d1h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 250m (12%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 43m kubelet, kubeworker2 Starting kubelet.
Warning ImageGCFailed 43m kubelet, kubeworker2 failed to get imageFs info: unable to find data in memory cache
Normal NodeAllocatableEnforced 43m kubelet, kubeworker2 Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 42m (x2 over 43m) kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 42m (x2 over 43m) kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 42m (x2 over 43m) kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasSufficientPID
Warning Rebooted 42m kubelet, kubeworker2 Node kubeworker2 has been rebooted, boot id: d487a166-68fa-4da9-b89d-45cefb6bddc1
Normal NodeReady 42m kubelet, kubeworker2 Node kubeworker2 status is now: NodeReady
Normal Starting 42m kube-proxy, kubeworker2 Starting kube-proxy.
Normal Starting 44s kubelet, kubeworker2 Starting kubelet.
Normal NodeHasSufficientMemory 44s kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasSufficientMemory
Normal NodeHasSufficientPID 44s kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 44s kubelet, kubeworker2 Updated Node Allocatable limit across pods
Normal NodeHasNoDiskPressure 28s (x2 over 44s) kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasNoDiskPressure
事象の復旧
Worker Nodeのkubeletを起動します。
ubeuser@kubeworker:~$ sudo systemctl start kubelet
kubeuser@kubeworker:~$ sudo systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor prese>
Drop-In: /etc/systemd/system/kubelet.service.d
mq10-kubeadm.conf
Active: active (running) since Tue 2021-12-07 13:14:32 UTC; 2s ago
Docs: https://kubernetes.io/docs/home/
Main PID: 2055516 (kubelet)
Tasks: 1 (limit: 2278)
Memory: 3.8M
CGroup: /system.slice/kubelet.service
mq2055516 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/>
Dec 07 13:14:32 kubeworker systemd[1]: Started kubelet: The Kubernetes Node Age>
では、改めてDeploymentのapplyを実行します。
kubeuser@kubemaster1:~$ kubectl apply -f nginx.yaml
deployment.apps/nginx created
kubeuser@kubemaster1:~$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/nginx-6c67f5ff6f-cc6x8 0/1 Pending 0 2s
pod/nginx-6c67f5ff6f-rkwt2 0/1 Pending 0 3s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6d1h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx 0/2 2 0 9s
NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-6c67f5ff6f 2 2 0 9s
kubeuser@kubemaster1:~$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-6c67f5ff6f-cc6x8 1/1 Running 0 23s 10.0.42.15 kubeworker <none> <none>
nginx-6c67f5ff6f-rkwt2 1/1 Running 0 24s 10.0.42.14 kubeworker <none> <none>
今後は、無事に起動出来ました!
おわりに
・Worker Node構築時にkubeletが自動起動になっていない(systemctl enable kubeletの未実行) ・何かしらの理由でkubeletがダウンした などに起因して発生するとは思うので発生頻度は稀かもしれませんが、
このことを想定している(知っている)か、そうではないかではトラブルシュートに差が出てきそうですね。