Kubernetes实战(十八)-Pod配置污点和容忍

1 污点

1.1 污点简介

亲和性调度的方式都是站在Pod的角度上，通过在Pod上增加属性来将Pod调度到到指定的节点上，其实也可以站在Node节点的角度上，通过给Node节点设置属性，来决定是否允许Pod调度过来，这就是污点。

Node被设置上污点之后就和Pod存在了一种相斥的关系，进而拒绝Pod调度进来，甚至可以将已经存在的Pod驱逐出去。

污点的格式为 key=value:effect，key和value是污点的标签，effect描述五点多额作用，支持如下三个选项

PreferNoSchedule：Kubernetes将尽量避免把Pod调度到具有此污点的Node上，除非没有其他节点可调度了
NoSchedule：Kubernetes将不会把Pod调度到具有该污点的Node上，但不会影响当前Node上已经存在的Pod
NoExecute：Kubernetes将不会把Pod调度到具有此污点的Node上，同时也会将Node上已经存在的Pod驱逐

1.2 污点命令

# 设置污点

$ kubectl taint nodes node1 key=value:effect

# 去除污点

$ kubectl taint nodes node1 key:effect-

# 去除所有污点

$ kubectl taint nodes node1 key-

1.3 污点案例

1）给node1设置一个污点，尽量不要调度过来pod

[root@master resource_manage]# kubectl taint nodes node1 name=nginx:PreferNoSchedule
node/node1 tainted

2）创建 nginx pod

[root@master resource_manage]# kubectl run nginx --image=nginx:1.17.1 --port=80
pod/nginx created

3）查询pod调度信息

[root@master resource_manage]# kubectl get pod -o wide
NAME    READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
nginx   1/1     Running   0          7s    10.244.2.48   node2   <none>           <none>

可以看到此时直接调度到node2了，不会调度到node1的，当然如果此时node2挂了，只有node1存活时，也会调度过来的。

1.4 查询节点污点

[root@master resource_manage]# kubectl describe node node1
Name:               node1
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=node1
                    kubernetes.io/os=linux
                    nodeenv=test
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"ba:fe:1f:25:fe:26"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.16.41
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 14 Mar 2022 14:41:02 +0800
Taints:             name=nginx:PreferNoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  node1
  AcquireTime:     <unset>
  RenewTime:       Sat, 26 Mar 2022 00:00:54 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Mon, 14 Mar 2022 14:43:39 +0800   Mon, 14 Mar 2022 14:43:39 +0800   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Fri, 25 Mar 2022 23:58:57 +0800   Mon, 14 Mar 2022 14:41:02 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Fri, 25 Mar 2022 23:58:57 +0800   Mon, 14 Mar 2022 14:41:02 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Fri, 25 Mar 2022 23:58:57 +0800   Mon, 14 Mar 2022 14:41:02 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Fri, 25 Mar 2022 23:58:57 +0800   Mon, 14 Mar 2022 14:43:42 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.16.41
  Hostname:    node1
Capacity:
  cpu:                8
  ephemeral-storage:  208357992Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32882960Ki
  pods:               110
Allocatable:
  cpu:                8
  ephemeral-storage:  192022725110
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32780560Ki
  pods:               110
System Info:
  Machine ID:                 f9c2b25f57184e06b8855490b4be6013
  System UUID:                d1042642-3933-564f-4f2d-279b5e96cead
  Boot ID:                    8517c1cc-8935-452e-9efb-a34f396b98a5
  Kernel Version:             5.4.179-200.el7.x86_64
  OS Image:                   CentOS Linux 7 (Core)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://20.10.9
  Kubelet Version:            v1.21.2
  Kube-Proxy Version:         v1.21.2
PodCIDR:                      10.244.1.0/24
PodCIDRs:                     10.244.1.0/24
Non-terminated Pods:          (4 in total)
  Namespace                   Name                                         CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                         ------------  ----------  ---------------  -------------  ---
  kube-system                 kube-flannel-ds-gg4jq                        100m (1%)     100m (1%)   50Mi (0%)        50Mi (0%)      11d
  kube-system                 kube-proxy-tqzjl                             0 (0%)        0 (0%)      0 (0%)           0 (0%)         11d
  kubernetes-dashboard        dashboard-metrics-scraper-c45b7869d-7ll25    0 (0%)        0 (0%)      0 (0%)           0 (0%)         11d
  kubernetes-dashboard        kubernetes-dashboard-79b5779bf4-t28b4        0 (0%)        0 (0%)      0 (0%)           0 (0%)         11d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (1%)  100m (1%)
  memory             50Mi (0%)  50Mi (0%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-1Gi      0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
Events:              <none>

1.5 删除污点

$ kubectl taint nodes node1 name:PreferNoSchedule-
node/node1 untainted

1.6 为什么创建Pod的时候不会调度到master节点？

通过如下命令可以看到master节点是默认设置了node-role.kubernetes.io/master:NoSchedule类型的污点，因此在创建pod的时候是不会往master节点调度的。

[root@master resource_manage]# kubectl describe nodes master
Name:               master
Roles:              control-plane,master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=master
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=
                    node-role.kubernetes.io/master=
                    node.kubernetes.io/exclude-from-external-load-balancers=
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"02:f6:8e:03:60:51"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.16.40
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 14 Mar 2022 14:38:03 +0800
Taints:             node-role.kubernetes.io/master:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  master
  AcquireTime:     <unset>
  RenewTime:       Sat, 26 Mar 2022 00:05:31 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Mon, 14 Mar 2022 14:42:58 +0800   Mon, 14 Mar 2022 14:42:58 +0800   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Sat, 26 Mar 2022 00:01:28 +0800   Mon, 14 Mar 2022 14:38:02 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Sat, 26 Mar 2022 00:01:28 +0800   Mon, 14 Mar 2022 14:38:02 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Sat, 26 Mar 2022 00:01:28 +0800   Mon, 14 Mar 2022 14:38:02 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Sat, 26 Mar 2022 00:01:28 +0800   Mon, 14 Mar 2022 14:43:03 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.16.40
  Hostname:    master
Capacity:
  cpu:                8
  ephemeral-storage:  208357992Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32882960Ki
  pods:               110
Allocatable:
  cpu:                8
  ephemeral-storage:  192022725110
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32780560Ki
  pods:               110
System Info:
  Machine ID:                 f9c2b25f57184e06b8855490b4be6013
  System UUID:                c5d32642-f84c-61ef-ac7f-d65ae6880a51
  Boot ID:                    9cbc9b25-2cf2-42d8-aa89-1fdab687c447
  Kernel Version:             5.4.179-200.el7.x86_64
  OS Image:                   CentOS Linux 7 (Core)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://20.10.9
  Kubelet Version:            v1.21.2
  Kube-Proxy Version:         v1.21.2
PodCIDR:                      10.244.0.0/24
PodCIDRs:                     10.244.0.0/24
Non-terminated Pods:          (6 in total)
  Namespace                   Name                              CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                              ------------  ----------  ---------------  -------------  ---
  kube-system                 etcd-master                       100m (1%)     0 (0%)      100Mi (0%)       0 (0%)         11d
  kube-system                 kube-apiserver-master             250m (3%)     0 (0%)      0 (0%)           0 (0%)         11d
  kube-system                 kube-controller-manager-master    200m (2%)     0 (0%)      0 (0%)           0 (0%)         11d
  kube-system                 kube-flannel-ds-n76xj             100m (1%)     100m (1%)   50Mi (0%)        50Mi (0%)      11d
  kube-system                 kube-proxy-h27ms                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         11d
  kube-system                 kube-scheduler-master             100m (1%)     0 (0%)      0 (0%)           0 (0%)         11d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                750m (9%)   100m (1%)
  memory             150Mi (0%)  50Mi (0%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:              <none>

2 容忍

2.1 容忍简介

当对一个node节点定义了污点，但是又希望某一些pod是可以调度到带有污点的节点上，此时就需要容忍了，污点就是拒绝，容忍就是忽略/允许，Node通过污点拒绝Pod调度上去，Pod通过容忍忽略拒绝，如下：

2.2 容忍实战

1）给node1设置NoSchedule污点

此时为演示，可以先保持只有node1一个节点，将其他节点关闭

[root@master resource_manage]# kubectl taint nodes node1 name=nginx:NoSchedule
node/node1 tainted

2）编辑带有容忍的pod_toleration.yaml文件

apiVersion: v1
kind: Namespace
metadata:
  name: dev

---

apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  namespace: dev
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
  tolerations:
  - key: "name"
    operator: "Equal"
    value: "nginx"
    effect: "NoSchedule"

3）创建资源

[root@master resource_manage]# kubectl apply -f pod_toleration.yaml
namespace/dev created
pod/nginx-pod created

4）查看验证

然后通过如下命令查看，可以发现此时还是可以调度到node1节点上的

[root@master resource_manage]# kubectl get pod -n dev -o wide
NAME        READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
nginx-pod   1/1     Running   0          13s   10.244.2.49   node1   <none>           <none>

2.3 容忍配置项说明

通过如下命令可以查看配置项的说明：

[root@master resource_manage]# kubectl explain pod.spec.tolerations
KIND:     Pod
VERSION:  v1

RESOURCE: tolerations <[]Object>

DESCRIPTION:
     If specified, the pod's tolerations.

     The pod this Toleration is attached to tolerates any taint that matches the
     triple <key,value,effect> using the matching operator <operator>.

FIELDS:
   effect       <string>
     Effect indicates the taint effect to match. Empty means match all taint
     effects. When specified, allowed values are NoSchedule, PreferNoSchedule
     and NoExecute.

   key  <string>
     Key is the taint key that the toleration applies to. Empty means match all
     taint keys. If the key is empty, operator must be Exists; this combination
     means to match all values and all keys.

   operator     <string>
     Operator represents a key's relationship to the value. Valid operators are
     Exists and Equal. Defaults to Equal. Exists is equivalent to wildcard for
     value, so that a pod can tolerate all taints of a particular category.

   tolerationSeconds    <integer>
     TolerationSeconds represents the period of time the toleration (which must
     be of effect NoExecute, otherwise this field is ignored) tolerates the
     taint. By default, it is not set, which means tolerate the taint forever
     (do not evict). Zero and negative values will be treated as 0 (evict
     immediately) by the system.

   value        <string>
     Value is the taint value the toleration matches to. If the operator is
     Exists, the value should be empty, otherwise just a regular string.