The labels are hierarchical, for example, . You can declare their hierarchies in the PD configuration file or pd-ctl
:
PD configuration file:
pd-ctl:
The number of machines must be no less than the .
Assume that the topology has three layers: zone > rack > host
. You can set a label for each layer by command line parameter or configuration file, then TiKV reports its label to PD:
TiKV command line parameter:
TiKV configuration file:
[server]
PD makes optimal scheduling according to the topological information. You only need to care about what kind of topology can achieve the desired effect.
Assume that you have 4 data zones, each zone has 2 racks, and each rack has 2 hosts. You can start 2 TiKV instances on each host as follows:
Start TiKV:
Configure PD:
$ pd-ctl
Now, PD schedules replicas of the same Region
to different data zones.
- Even if one data zone goes down, the TiKV cluster is still highly available.
- If the data zone cannot recover within a period of time, PD removes the replica from this data zone.