The labels are hierarchical, for example, . You can declare their hierarchies in the PD configuration file or pd-ctl:

  • PD configuration file:

  • pd-ctl:

      The number of machines must be no less than the .

    Assume that the topology has three layers: zone > rack > host. You can set a label for each layer by command line parameter or configuration file, then TiKV reports its label to PD:

    • TiKV command line parameter:

    • TiKV configuration file:

      1. [server]

    PD makes optimal scheduling according to the topological information. You only need to care about what kind of topology can achieve the desired effect.

    Assume that you have 4 data zones, each zone has 2 racks, and each rack has 2 hosts. You can start 2 TiKV instances on each host as follows:

    Start TiKV:

    Configure PD:

    1. $ pd-ctl

    Now, PD schedules replicas of the same Region to different data zones.

    • Even if one data zone goes down, the TiKV cluster is still highly available.
    • If the data zone cannot recover within a period of time, PD removes the replica from this data zone.