Google-Cloud-Platform
GKE 無法在具有 GPU 的新添加節點上調度需要 GPU 的新創建的 Pod
當使用 GPU 添加新的池節點時,Google Kubernetes Engine 無法在這些新節點上安排需要 GPU 的新創建的 pod,應該是自動的,但我猜不是 GPU 資源,新的 pod 永遠處於“待定”狀態,如何解決這個問題?
編輯:這是部署 yaml 文件,我的目標是不將部署綁定到特定節點:
--- apiVersion: machinelearning.seldon.io/v1alpha2 kind: SldDeployment metadata: labels: app: sld name: trs-sld namespace: trs spec: annotations: project_name: Trs deployment_version: v1.0 seldon.io/rest-connect-retries: '5' seldon.io/grpc-connect-retries: '5' seldon.io/istio-retries: '10' seldon.io/istio-retries-timeout: '12' name: trs predictors: - componentSpecs: - spec: containers: - image: eu.gcr.io/trs-141513/trs-native:latest imagePullPolicy: Always name: classifier resources: limits: nvidia.com/gpu: 2 volumeMounts: - mountPath: /etc/google_storage/creds name: service-account-creds readOnly: true volumes: - name: service-account-creds secret: secretName: service-account-creds terminationGracePeriodSeconds: 20 graph: children: [] name: classifier endpoint: type: REST type: MODEL name: model replicas: 1 annotations: predictor_version: v1.0 ---
事實證明,每次添加新節點時都需要安裝 GPU 驅動程序,例如,對於 Ubuntu 容器:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.yaml