Configure Multiple Schedulers

    A detailed description of how to implement a scheduler is outside the scope of this document. Please refer to the kube-scheduler implementation in pkg/scheduler in the Kubernetes source directory for a canonical example.

    You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using or you can use one of these Kubernetes playgrounds:

    To check the version, enter .

    Package your scheduler binary into a container image. For the purposes of this example, you can use the default scheduler (kube-scheduler) as your second scheduler. Clone the Kubernetes source code from GitHub and build the source.

    Create a container image containing the kube-scheduler binary. Here is the Dockerfile to build the image:

    1. FROM busybox
    2. ADD ./_output/local/bin/linux/amd64/kube-scheduler /usr/local/bin/kube-scheduler

    Save the file as Dockerfile, build the image and push it to a registry. This example pushes the image to . For more details, please read the GCR documentation.

    1. docker build -t gcr.io/my-gcp-project/my-kube-scheduler:1.0 .
    2. gcloud docker -- push gcr.io/my-gcp-project/my-kube-scheduler:1.0

    Now that you have your scheduler in a container image, create a pod configuration for it and run it in your Kubernetes cluster. But instead of creating a pod directly in the cluster, you can use a for this example. A Deployment manages a which in turn manages the pods, thereby making the scheduler resilient to failures. Here is the deployment config. Save it as my-scheduler.yaml:

    admin/sched/my-scheduler.yaml

    1. apiVersion: v1
    2. kind: ServiceAccount
    3. metadata:
    4. name: my-scheduler
    5. namespace: kube-system
    6. ---
    7. apiVersion: rbac.authorization.k8s.io/v1
    8. kind: ClusterRoleBinding
    9. metadata:
    10. name: my-scheduler-as-kube-scheduler
    11. subjects:
    12. - kind: ServiceAccount
    13. name: my-scheduler
    14. namespace: kube-system
    15. roleRef:
    16. kind: ClusterRole
    17. name: system:kube-scheduler
    18. apiGroup: rbac.authorization.k8s.io
    19. ---
    20. apiVersion: rbac.authorization.k8s.io/v1
    21. kind: ClusterRoleBinding
    22. metadata:
    23. name: my-scheduler-as-volume-scheduler
    24. subjects:
    25. - kind: ServiceAccount
    26. name: my-scheduler
    27. namespace: kube-system
    28. roleRef:
    29. kind: ClusterRole
    30. name: system:volume-scheduler
    31. apiGroup: rbac.authorization.k8s.io
    32. ---
    33. apiVersion: v1
    34. kind: ConfigMap
    35. metadata:
    36. name: my-scheduler-config
    37. namespace: kube-system
    38. data:
    39. my-scheduler-config.yaml: |
    40. apiVersion: kubescheduler.config.k8s.io/v1beta2
    41. kind: KubeSchedulerConfiguration
    42. profiles:
    43. - schedulerName: my-scheduler
    44. leaderElection:
    45. leaderElect: false
    46. ---
    47. apiVersion: apps/v1
    48. kind: Deployment
    49. metadata:
    50. component: scheduler
    51. tier: control-plane
    52. name: my-scheduler
    53. namespace: kube-system
    54. spec:
    55. selector:
    56. matchLabels:
    57. component: scheduler
    58. tier: control-plane
    59. replicas: 1
    60. template:
    61. metadata:
    62. labels:
    63. tier: control-plane
    64. version: second
    65. spec:
    66. serviceAccountName: my-scheduler
    67. containers:
    68. - command:
    69. - /usr/local/bin/kube-scheduler
    70. - --config=/etc/kubernetes/my-scheduler/my-scheduler-config.yaml
    71. image: gcr.io/my-gcp-project/my-kube-scheduler:1.0
    72. livenessProbe:
    73. httpGet:
    74. path: /healthz
    75. port: 10251
    76. initialDelaySeconds: 15
    77. name: kube-second-scheduler
    78. readinessProbe:
    79. httpGet:
    80. path: /healthz
    81. port: 10251
    82. resources:
    83. requests:
    84. cpu: '0.1'
    85. securityContext:
    86. privileged: false
    87. volumeMounts:
    88. - name: config-volume
    89. mountPath: /etc/kubernetes/my-scheduler
    90. hostNetwork: false
    91. hostPID: false
    92. volumes:
    93. - name: config-volume
    94. configMap:
    95. name: my-scheduler-config

    In the above manifest, you use a to customize the behavior of your scheduler implementation. This configuration has been passed to the kube-scheduler during initialization with the --config option. The my-scheduler-config ConfigMap stores the configuration file. The Pod of themy-scheduler Deployment mounts the my-scheduler-config ConfigMap as a volume.

    In the aforementioned Scheduler Configuration, your scheduler implementation is represented via a KubeSchedulerProfile.

    Note: To determine if a scheduler is responsible for scheduling a specific Pod, the spec.schedulerName field in a PodTemplate or Pod manifest must match the schedulerName field of the KubeSchedulerProfile. All schedulers running in the cluster must have unique names.

    Also, note that you create a dedicated service account my-scheduler and bind the ClusterRole system:kube-scheduler to it so that it can acquire the same privileges as kube-scheduler.

    In order to run your scheduler in a Kubernetes cluster, create the deployment specified in the config above in a Kubernetes cluster:

    1. kubectl create -f my-scheduler.yaml

    Verify that the scheduler pod is running:

    1. NAME READY STATUS RESTARTS AGE
    2. ....
    3. my-scheduler-lnf4s-4744f 1/1 Running 0 2m
    4. ...

    You should see a “Running” my-scheduler pod, in addition to the default kube-scheduler pod in this list.

    To run multiple-scheduler with leader election enabled, you must do the following:

    Update the following fields for the KubeSchedulerConfiguration in the my-scheduler-config ConfigMap in your YAML file:

    • leaderElection.leaderElect to true
    • leaderElection.resourceNamespace to <lock-object-namespace>
    • leaderElection.resourceName to <lock-object-name>

    Note: The control plane creates the lock objects for you, but the namespace must already exist. You can use the kube-system namespace.

    If RBAC is enabled on your cluster, you must update the system:kube-scheduler cluster role. Add your scheduler name to the resourceNames of the rule applied for endpoints and leases resources, as in the following example:

    1. kubectl edit clusterrole system:kube-scheduler

    Configure Multiple Schedulers - 图2

    1. apiVersion: rbac.authorization.k8s.io/v1
    2. kind: ClusterRole
    3. metadata:
    4. annotations:
    5. rbac.authorization.kubernetes.io/autoupdate: "true"
    6. labels:
    7. kubernetes.io/bootstrapping: rbac-defaults
    8. name: system:kube-scheduler
    9. - apiGroups:
    10. resources:
    11. - leases
    12. verbs:
    13. - create
    14. - apiGroups:
    15. - coordination.k8s.io
    16. resourceNames:
    17. - kube-scheduler
    18. - my-scheduler
    19. resources:
    20. - leases
    21. verbs:
    22. - get
    23. - update
    24. - apiGroups:
    25. - ""
    26. resourceNames:
    27. - kube-scheduler
    28. - my-scheduler
    29. resources:
    30. - endpoints
    31. verbs:
    32. - delete
    33. - get
    34. - patch
    35. - update

    Now that your second scheduler is running, create some pods, and direct them to be scheduled by either the default scheduler or the one you deployed. In order to schedule a given pod using a specific scheduler, specify the name of the scheduler in that pod spec. Let’s look at three examples.

    • Pod spec without any scheduler name

      admin/sched/pod1.yaml

      1. apiVersion: v1
      2. kind: Pod
      3. metadata:
      4. name: no-annotation
      5. labels:
      6. name: multischeduler-example
      7. spec:
      8. containers:
      9. - name: pod-with-no-annotation-container
      10. image: k8s.gcr.io/pause:2.0

      When no scheduler name is supplied, the pod is automatically scheduled using the default-scheduler.

    • Pod spec with default-scheduler

      Configure Multiple Schedulers - 图4

      1. apiVersion: v1
      2. kind: Pod
      3. metadata:
      4. name: annotation-default-scheduler
      5. labels:
      6. name: multischeduler-example
      7. spec:
      8. schedulerName: default-scheduler
      9. containers:
      10. - name: pod-with-default-annotation-container
      11. image: k8s.gcr.io/pause:2.0

      A scheduler is specified by supplying the scheduler name as a value to spec.schedulerName. In this case, we supply the name of the default scheduler which is default-scheduler.

      Save this file as pod2.yaml and submit it to the Kubernetes cluster.

      1. kubectl create -f pod2.yaml
    • Pod spec with my-scheduler

      admin/sched/pod3.yaml

      1. apiVersion: v1
      2. kind: Pod
      3. metadata:
      4. name: annotation-second-scheduler
      5. labels:
      6. name: multischeduler-example
      7. spec:
      8. schedulerName: my-scheduler
      9. containers:
      10. - name: pod-with-second-annotation-container
      11. image: k8s.gcr.io/pause:2.0

      In this case, we specify that this pod should be scheduled using the scheduler that we deployed - my-scheduler. Note that the value of spec.schedulerName should match the name supplied for the scheduler in the schedulerName field of the mapping KubeSchedulerProfile.

      Save this file as pod3.yaml and submit it to the Kubernetes cluster.

        Verify that all three pods are running.

      Verifying that the pods were scheduled using the desired schedulers

      In order to make it easier to work through these examples, we did not verify that the pods were actually scheduled using the desired schedulers. We can verify that by changing the order of pod and deployment config submissions above. If we submit all the pod configs to a Kubernetes cluster before submitting the scheduler deployment config, we see that the pod annotation-second-scheduler remains in “Pending” state forever while the other two pods get scheduled. Once we submit the scheduler deployment config and our new scheduler starts running, the pod gets scheduled as well.

      Alternatively, you can look at the “Scheduled” entries in the event logs to verify that the pods were scheduled by the desired schedulers.

      1. kubectl get events

      You can also use a custom scheduler configuration or a custom container image for the cluster’s main scheduler by modifying its static pod manifest on the relevant control plane nodes.