Poseidon-Firmament Scheduler

    This feature is currently in a alpha state, meaning:

    • The version names contain alpha (e.g. v1alpha1).
    • Might be buggy. Enabling the feature may expose bugs. Disabled by default.
    • Support for feature may be dropped at any time without notice.
    • The API may change in incompatible ways in a later software release without notice.
    • Recommended for use only in short-lived testing clusters, due to increased risk of bugs and lack of long-term support.

    The Poseidon-Firmament scheduler is an alternate scheduler that can be deployed alongside the default Kubernetes scheduler.

    Poseidon is a service that acts as the integration glue between the Firmament scheduler and Kubernetes. Poseidon-Firmament augments the current Kubernetes scheduling capabilities. It incorporates novel flow network graph based scheduling capabilities alongside the default Kubernetes scheduler. The Firmament scheduler models workloads and clusters as flow networks and runs min-cost flow optimizations over these networks to make scheduling decisions.

    Firmament models the scheduling problem as a constraint-based optimization over a flow network graph. This is achieved by reducing scheduling to a min-cost max-flow optimization problem. The Poseidon-Firmament scheduler dynamically refines the workload placements.

    Poseidon-Firmament scheduler runs alongside the default Kubernetes scheduler as an alternate scheduler. You can simultaneously run multiple, different schedulers.

    • Workloads (Pods) are bulk scheduled to enable scheduling at massive scale.
      The Poseidon-Firmament scheduler outperforms the Kubernetes default scheduler by a wide margin when it comes to throughput performance for scenarios where compute resource requirements are somewhat uniform across your workload (Deployments, ReplicaSets, Jobs).
    • The Poseidon-Firmament’s scheduler’s end-to-end throughput performance and bind time improves as the number of nodes in a cluster increases. As you scale out, Poseidon-Firmament scheduler is able to amortize more and more work across workloads.
    • Scheduling in Poseidon-Firmament is dynamic; it keeps cluster resources in a global optimal state during every scheduling run.

    How the Poseidon-Firmament scheduler works

    Kubernetes supports using multiple schedulers. You can specify, for a particular Pod, that it is scheduled by a custom scheduler (“poseidon” for this case), by setting the schedulerName field in the PodSpec at the time of pod creation. The default scheduler will ignore that Pod and allow Poseidon-Firmament scheduler to schedule the Pod on a relevant node.

    For example:

    Batch scheduling

    As mentioned earlier, Poseidon-Firmament scheduler enables an extremely high throughput scheduling environment at scale due to its bulk scheduling approach versus Kubernetes pod-at-a-time approach. In our extensive tests, we have observed substantial throughput benefits as long as resource requirements (CPU/Memory) for incoming Pods are uniform across jobs (Replicasets/Deployments/Jobs), mainly due to efficient amortization of work across jobs.

    Although, Poseidon-Firmament scheduler is capable of scheduling various types of workloads, such as service, batch, etc., the following are a few use cases where it excels the most:

    1. For “Big Data/AI” jobs consisting of large number of tasks, throughput benefits are tremendous.
    2. Service or batch jobs where workload resource requirements are uniform across jobs (Replicasets/Deployments/Jobs).

    Poseidon-Firmament is designed to work with Kubernetes release 1.6 and all subsequent releases.

    Feature comparison

    Installation

    The Poseidon-Firmament installation guide explains how to deploy Poseidon-Firmament to your cluster.

    Note: Please refer to the for detailed throughput performance comparison test results between Poseidon-Firmament scheduler and the Kubernetes default scheduler.

    Pod-by-pod schedulers, such as the Kubernetes default scheduler, process Pods in small batches (typically one at a time). These schedulers have the following crucial drawbacks:

    1. The scheduler commits to a pod placement early and restricts the choices for other pods that wait to be placed.
    2. There is limited opportunities for amortizing work across pods because they are considered for placement individually.

    These downsides of pod-by-pod schedulers are addressed by batching or bulk scheduling in Poseidon-Firmament scheduler. Processing several pods in a batch allows the scheduler to jointly consider their placement, and thus to find the best trade-off for the whole batch instead of one pod. At the same time it amortizes work across pods resulting in much higher throughput.

    What’s next

    • See on GitHub for more information.
    • See the design document for Poseidon.
    • Read , the academic paper on the Firmament scheduling design.
    • If you’d like to contribute to Poseidon-Firmament, refer to the developer setup instructions.

    Feedback

    Was this page helpful?