Pod Scheduling Readiness

    Pods were considered ready for scheduling once created. Kubernetes scheduler does its due diligence to find nodes to place all pending Pods. However, in a real-world case, some Pods may stay in a “miss-essential-resources” state for a long period. These Pods actually churn the scheduler (and downstream integrators like Cluster AutoScaler) in an unnecessary manner.

    By specifying/removing a Pod’s .spec.schedulingGates, you can control when a Pod is ready to be considered for scheduling.

    The schedulingGates field contains a list of strings, and each string literal is perceived as a criteria that Pod should be satisfied before considered schedulable. This field can be initialized only when a Pod is created (either by the client, or mutated during admission). After creation, each schedulingGate can be removed in arbitrary order, but addition of a new scheduling gate is disallowed.

    stateDiagram-v2 s1: pod created s2: pod scheduling gated s3: pod scheduling ready s4: pod running if: empty scheduling gates? [*] —> s1 s1 —> if s2 —> if: scheduling gate removed if —> s2: no if —> s3: yes s3 —> s4 s4 —> [*]

    JavaScript must be to view this content

    After the Pod’s creation, you can check its state using:

    1. kubectl get pod test-pod

    The output reveals it’s in SchedulingGated state:

    1. NAME READY STATUS RESTARTS AGE

    You can also check its schedulingGates field by running:

    The output is:

    1. [{"name":"foo"},{"name":"bar"}]

    pods/pod-without-scheduling-gates.yaml Pod Scheduling Readiness - 图2

    1. apiVersion: v1
    2. kind: Pod
    3. metadata:
    4. name: test-pod
    5. spec:
    6. - name: pause
    7. image: registry.k8s.io/pause:3.6

    You can check if the is cleared by running:

    The output is expected to be empty. And you can check its latest status by running:

    1. kubectl get pod test-pod -o wide

    Given the test-pod doesn’t request any CPU/memory resources, it’s expected that this Pod’s state get transited from previous SchedulingGated to Running:

    1. NAME READY STATUS RESTARTS AGE IP NODE
    2. test-pod 1/1 Running 0 15s 10.0.0.4 node-2

    The metric scheduler_pending_pods comes with a new label "gated" to distinguish whether a Pod has been tried scheduling but claimed as unschedulable, or explicitly marked as not ready for scheduling. You can use scheduler_pending_pods{queue="gated"} to check the metric result.

    • Read the for more details