openeuler+riscv的开发板部署了k3s集群,但coredns的pod存在启动问题,不同pod间不能ping通
k3s集群中部署了2个node,master中的coredns这个pod默认状态如下:
local-path-provisioner-6d44f4f9d7-z5b9c 0/1 CrashLoopBackOff 1089 (3m37s ago) 35d 10.42.0.25 openeuler-riscv64 <none> <none> metrics-server-7c55d89d5d-kpj5h 0/1 CrashLoopBackOff 1077 (2m50s ago) 35d 10.42.0.22 openeuler-riscv64 <none> <none> helm-install-traefik-crd-hhfn4 0/1 CrashLoopBackOff 820 (119s ago) 35d 10.42.0.21 openeuler-riscv64 <none> <none> helm-install-traefik-r6bm8 0/1 CrashLoopBackOff 819 (109s ago) 35d 10.42.0.24 openeuler-riscv64 <none> <none> coredns-97b598894-7l5ff 0/1 CrashLoopBackOff 8 (54s ago) 17m 10.42.0.27 openeuler-riscv64 <none> <none>
kubectl describe显示如下:
[root@openeuler-riscv64 ~]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-97b598894-tqr5v 0/1 CrashLoopBackOff 9 (15h ago) 15h metrics-server-7c55d89d5d-kpj5h 0/1 Running 1074 (119s ago) 35d local-path-provisioner-6d44f4f9d7-z5b9c 0/1 CrashLoopBackOff 1086 (32s ago) 35d helm-install-traefik-r6bm8 0/1 CrashLoopBackOff 815 (15s ago) 35d helm-install-traefik-crd-hhfn4 0/1 CrashLoopBackOff 816 (15s ago) 35d coredns-97b598894-tqr5v Name: coredns-97b598894-tqr5vn kube-system coredns-97b598894-tqr5v Namespace: kube-system Priority: 2000000000 Priority Class Name: system-cluster-critical Service Account: coredns Node: k3s-air1/172.20.10.3 Start Time: Mon, 22 Jan 2024 17:09:59 +0800 Labels: k8s-app=kube-dns pod-template-hash=97b598894 Annotations: <none> Status: Terminating (lasts 2m4s) Termination Grace Period: 30s IP: 10.42.1.26 IPs: IP: 10.42.1.26 Controlled By: ReplicaSet/coredns-97b598894 Containers: coredns: Container ID: docker://c5386186c0177f658a96df702607bca0f795185cc7438ae29b1065dca1051cbc Image: carvicsforth/coredns:1.10.1 Image ID: docker-pullable://carvicsforth/coredns@sha256:6cd10cf78af68af9bfebc932c22724a64d4ce0e7ff94738aef6b92df7565f4b1 Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Mon, 22 Jan 2024 17:32:03 +0800 Finished: Mon, 22 Jan 2024 17:32:08 +0800 Ready: False Restart Count: 9 Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get http://:8181/ready delay=0s timeout=1s period=2s #success=1 #failure=3 Environment: <none> Mounts: /etc/coredns from config-volume (ro) /etc/coredns/custom from custom-config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dbth2 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True DisruptionTarget True Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false custom-config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns-custom Optional: true kube-api-access-dbth2: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: CriticalAddonsOnly op=Exists node-role.kubernetes.io/control-plane:NoSchedule op=Exists node-role.kubernetes.io/master:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Topology Spread Constraints: kubernetes.io/hostname:DoNotSchedule when max skew 1 is exceeded for selector k8s-app=kube-dns Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 15h default-scheduler Successfully assigned kube-system/coredns-97b598894-tqr5v to k3s-air1 Normal Pulled 15h (x3 over 15h) kubelet Container image "carvicsforth/coredns:1.10.1" already present on machine Normal Created 15h (x3 over 15h) kubelet Created container coredns Normal Started 15h (x3 over 15h) kubelet Started container coredns Warning Unhealthy 15h (x14 over 15h) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503 Warning BackOff 15h (x95 over 15h) kubelet Back-off restarting failed container coredns in pod coredns-97b598894-tqr5v_kube-system(5f26f744-5697-47c4-a895-7a1dbff23b96)
kubectl logs如下:
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.override [WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server [INFO] plugin/ready: Still waiting on: "kubernetes" [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [INFO] plugin/ready: Still waiting on: "kubernetes" [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [INFO] plugin/ready: Still waiting on: "kubernetes" [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [INFO] plugin/ready: Still waiting on: "kubernetes" [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API Listen: listen tcp :53: bind: permission denied
用ps -ef查看发现并不是root执行,然后更改coredns.yaml文件,增加如下内容:
+affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: node-role.kubernetes.io/master + operator: Exists nodeSelector: kubernetes.io/os: linux ...... securityContext: + runAsUser: 0 + runAsGroup: 0 allowPrivilegeEscalation: false
重启corddns这个pod后状态如下:
coredns-65dc9b694c-xx4pf 0/1 Running 0 63m 10.42.0.28 openeuler-riscv64 <none> <none> helm-install-traefik-crd-hhfn4 0/1 CrashLoopBackOff 829 (3m56s ago) 35d 10.42.0.21 openeuler-riscv64 <none> <none> helm-install-traefik-r6bm8 0/1 CrashLoopBackOff 828 (3m24s ago) 35d 10.42.0.24 openeuler-riscv64 <none> <none> local-path-provisioner-6d44f4f9d7-z5b9c 0/1 CrashLoopBackOff 1102 (3m ago) 35d 10.42.0.25 openeuler-riscv64 <none> <none> metrics-server-7c55d89d5d-kpj5h 0/1 CrashLoopBackOff 1090 (81s ago) 35d 10.42.0.22 openeuler-riscv64 <none> <none>
状态为running,但ready还是0/1,kubectl describe显示的event如下:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Unhealthy 4m17s (x1850 over 64m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
logs如下:
[INFO] plugin/ready: Still waiting on: "kubernetes" [INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: i/o timeout [INFO] plugin/kubernetes: Trace[1972775025]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231 (23-Jan-2024 01:36:37.721) (total time: 30001ms): Trace[1972775025]: ---"Objects listed" error:Get "https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: i/o timeout 30001ms (01:37:07.723) Trace[1972775025]: [30.001897321s] [30.001897321s] END [ERROR] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: i/o timeout [INFO] plugin/ready: Still waiting on: "kubernetes" [INFO] plugin/ready: Still waiting on: "kubernetes" [INFO] plugin/ready: Still waiting on: "kubernetes" [WARNING] No files matching import glob pattern: /etc/coredns/custom/*.override [WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
pod间还是不能ping通,有小伙伴有过类似的问题吗?