Deploy TiDB on Alibaba Cloud Kubernetes

    To deploy TiDB Operator and the TiDB cluster in a self-managed Kubernetes environment, refer to Deploy TiDB Operator and .

    • aliyun-cli >= 3.0.15 and

      Note

      The access key must be granted permissions to control the corresponding resources.

    • kubectl >= 1.12

    • jq >= 1.6

    • 0.12.*

    You can use Cloud Shell of Alibaba Cloud to perform operations. All the tools have been pre-installed and configured in the Cloud Shell of Alibaba Cloud.

    To deploy a TiDB cluster, make sure you have the following privileges:

    • AliyunECSFullAccess
    • AliyunESSFullAccess
    • AliyunVPCFullAccess
    • AliyunSLBFullAccess
    • AliyunCSFullAccess
    • AliyunEIPFullAccess
    • AliyunECIFullAccess
    • AliyunVPNGatewayFullAccess
    • AliyunNATGatewayFullAccess

    Overview of things to create

    In the default configuration, you will create:

    • A new VPC

    • An ECS instance as the bastion machine

    • A managed ACK (Alibaba Cloud Kubernetes) cluster with the following ECS instance worker nodes:

      • An auto-scaling group of 2 * instances (2 cores, 2 GB RAM). The default auto-scaling group of managed Kubernetes must have at least two instances to host the whole system service, like CoreDNS
      • An auto-scaling group of 3 * instances for deploying the PD cluster
      • An auto-scaling group of 3 * ecs.i2.2xlarge instances for deploying the TiKV cluster
      • An auto-scaling group of 2 * ecs.c5.4xlarge instances for deploying the TiDB cluster
      • An auto-scaling group of 1 * ecs.c5.xlarge instance for deploying monitoring components
      • A 100 GB cloud disk used to store monitoring data

    All the instances except ACK mandatory workers are deployed across availability zones (AZs) to provide cross-AZ high availability. The auto-scaling group ensures the desired number of healthy instances, so the cluster can auto-recover from node failure or even AZ failure.

    Deploy

    1. Configure the target region and Alibaba Cloud key (you can also set these variables in the terraform command prompt):

      The variables.tf file contains default settings of variables used for deploying the cluster. You can change it or use the -var option to override a specific variable to fit your need.

    2. Use Terraform to set up the cluster.

      1. git clone --depth=1 https://github.com/pingcap/tidb-operator && \
      2. cd tidb-operator/deploy/aliyun

      You can create or modify terraform.tfvars to set the values of the variables, and configure the cluster to fit your needs. You can view the configurable variables and their descriptions in variables.tf. The following is an example of how to configure the ACK cluster name, the TiDB cluster name, the TiDB Operator version, and the number of PD, TiKV, and TiDB nodes.

      1. cluster_name = "testack"
      2. tidb_cluster_name = "testdb"
      3. tikv_count = 3
      4. tidb_count = 2
      5. pd_count = 3
      6. operator_version = "v1.3.2"
      • To deploy TiFlash in the cluster, set create_tiflash_node_pool = true in terraform.tfvars. You can also configure the node count and instance type of the TiFlash node pool by modifying tiflash_count and tiflash_instance_type. By default, the value of tiflash_count is 2, and the value of tiflash_instance_type is ecs.i2.2xlarge.

      • To deploy TiCDC in the cluster, set create_cdc_node_pool = true in terraform.tfvars. You can also configure the node count and instance type of the TiCDC node pool by modifying cdc_count and cdc_instance_type. By default, the value of cdc_count is 3, and the value of cdc_instance_type is ecs.c5.2xlarge.

      Alibaba Cloud ACK - 图2Note

      Check the operator_version in the variables.tf file for the default TiDB Operator version of the current scripts. If the default version is not your desired one, configure operator_version in terraform.tfvars.

      After the configuration, execute the following commands to initialize and deploy the cluster:

      1. terraform init

      Input “yes” to confirm execution when you run the following apply command:

      1. terraform apply

      If you get an error while running terraform apply, fix the error (for example, lack of permission) according to the error description and run terraform apply again.

      It takes 5 to 10 minutes to create the whole stack using terraform apply. Once the installation is complete, the basic cluster information is printed:

      1. Apply complete! Resources: 3 added, 0 changed, 1 destroyed.
      2. Outputs:
      3. bastion_ip = 47.96.174.214
      4. cluster_id = c2d9b20854a194f158ef2bc8ea946f20e
      5. kubeconfig_file = /tidb-operator/deploy/aliyun/credentials/kubeconfig
      6. monitor_endpoint = not_created
      7. region = cn-hangzhou
      8. ssh_key_file = /tidb-operator/deploy/aliyun/credentials/my-cluster-keyZ.pem
      9. tidb_endpoint = not_created
      10. tidb_version = v3.0.0
      11. vpc_id = vpc-bp1v8i5rwsc7yh8dwyep5

      You can use the terraform output command to get the output again.

    3. You can then interact with the ACK cluster using kubectl or helm:

      1. export KUBECONFIG=$PWD/credentials/kubeconfig
      1. Prepare the TidbCluster and TidbMonitor CR files:

        1. cp manifests/db.yaml.example db.yaml && cp manifests/db-monitor.yaml.example db-monitor.yaml

        To complete the CR file configuration, refer to and Configure a TiDB Cluster.

        • To deploy TiFlash, configure spec.tiflash in db.yaml as follows:

          1. spec
          2. ...
          3. tiflash:
          4. baseImage: pingcap/tiflash
          5. maxFailoverCount: 0
          6. nodeSelector:
          7. dedicated: TIDB_CLUSTER_NAME-tiflash
          8. replicas: 1
          9. storageClaims:
          10. - resources:
          11. requests:
          12. storage: 100Gi
          13. storageClassName: local-volume
          14. tolerations:
          15. - effect: NoSchedule
          16. key: dedicated
          17. operator: Equal
          18. value: TIDB_CLUSTER_NAME-tiflash

          To configure other parameters, refer to .

          Modify replicas, storageClaims[].resources.requests.storage, and storageClassName according to your needs.

          Warning

          Since TiDB Operator will mount PVs automatically in the order of the items in the storageClaims list, if you need to add more disks to TiFlash, make sure to append the new item only to the end of the original items, and DO NOT modify the order of the original items.

        • To deploy TiCDC, configure spec.ticdc in db.yaml as follows:

          1. spec
          2. ...
          3. ticdc:
          4. baseImage: pingcap/ticdc
          5. nodeSelector:
          6. dedicated: TIDB_CLUSTER_NAME-cdc
          7. tolerations:
          8. - effect: NoSchedule
          9. key: dedicated
          10. operator: Equal
          11. value: TIDB_CLUSTER_NAME-cdc

          Modify replicas according to your needs.

        To deploy Enterprise Edition of TiDB/PD/TiKV/TiFlash/TiCDC, edit the db.yaml file to set spec.<tidb/pd/tikv/tiflash/ticdc>.baseImage to the enterprise image (pingcap/<tidb/pd/tikv/tiflash/ticdc>-enterprise).

        For example:

        1. spec:
        2. ...
        3. pd:
        4. baseImage: pingcap/pd-enterprise
        5. ...
        6. tikv:
        7. baseImage: pingcap/tikv-enterprise

        Alibaba Cloud ACK - 图5Note

        • Replace all the TIDB_CLUSTER_NAME in the db.yaml and db-monitor.yaml files with tidb_cluster_name configured in the deployment of ACK.
        • Make sure the number of PD, TiKV, TiFlash, TiCDC, or TiDB nodes is >= the replicas value of the corresponding component in db.yaml.
        • Make sure spec.initializer.version in db-monitor.yaml is the same as spec.version in db.yaml. Otherwise, the monitor might not display correctly.
      2. Create Namespace:

        1. kubectl --kubeconfig credentials/kubeconfig create namespace ${namespace}

        Note

        You can give the namespace a name that is easy to memorize, such as the same name as tidb_cluster_name.

      3. Deploy the TiDB cluster:

        1. kubectl --kubeconfig credentials/kubeconfig create -f db.yaml -n ${namespace} &&
        2. kubectl --kubeconfig credentials/kubeconfig create -f db-monitor.yaml -n ${namespace}

      Alibaba Cloud ACK - 图7Note

      If you need to deploy a TiDB cluster on ARM64 machines, refer to .

      Access the database

      You can connect the TiDB cluster via the bastion instance. All necessary information is in the output printed after installation is finished (replace the ${} parts with values from the output):

      1. ssh -i credentials/${cluster_name}-key.pem root@${bastion_ip}

      tidb_lb_ip is the LoadBalancer IP of the TiDB service.

      Note

      • is updated from mysql_native_password to caching_sha2_password. Therefore, if you use MySQL client from MySQL 8.0 to access the TiDB service (TiDB version < v4.0.7), and if the user account has a password, you need to explicitly specify the --default-auth=mysql_native_password parameter.
      • By default, TiDB (starting from v4.0.2) periodically shares usage details with PingCAP to help understand how to improve the product. For details about what is shared and how to disable the sharing, see Telemetry.

      Visit <monitor-lb>:3000 to view the Grafana dashboards. monitor-lb is the LoadBalancer IP of the Monitor service.

      The initial login user account and password:

      • User: admin
      • Password: admin

      Alibaba Cloud ACK - 图9Warning

      If you already have a VPN connecting to your VPC or plan to set up one, it is strongly recommended that you go to the spec.grafana.service.annotations section in the db-monitor.yaml file and set service.beta.kubernetes.io/alicloud-loadbalancer-address-type to intranet for security.

      Upgrade

      This may take a while to complete. You can watch the process using the following command:

      1. kubectl get pods --namespace ${namespace} -o wide --watch

      Scale out the TiDB cluster

      To scale out the TiDB cluster, modify tikv_count, tiflash_count, cdc_count, or tidb_count in the terraform.tfvars file, and then run terraform apply to scale out the number of nodes for the corresponding components.

      After the nodes scale out, modify the replicas of the corresponding components by running kubectl --kubeconfig credentials/kubeconfig edit tc ${tidb_cluster_name} -n ${namespace}.

      Note

      • Because it is impossible to determine which node will be taken offline during the scale-in process, the scale-in of TiDB clusters is currently not supported.
      • The scale-out process takes a few minutes. You can watch the status by running kubectl --kubeconfig credentials/kubeconfig get po -n ${namespace} --watch.

      Configure

      You can set the variables in terraform.tfvars to configure TiDB Operator. Most configuration items can be modified after you understand the semantics based on the comments of the variable. Note that the operator_helm_values configuration item can provide a customized values.yaml configuration file for TiDB Operator. For example:

      • Set operator_helm_values in terraform.tfvars:

        1. operator_helm_values = "./my-operator-values.yaml"
      • Set operator_helm_values in main.tf:

        1. operator_helm_values = file("./my-operator-values.yaml")

      In the default configuration, the Terraform script creates a new VPC. To use the existing VPC, set vpc_id in variable.tf. In this case, Kubernetes nodes are not deployed in AZs with vSwitch not configured.

      See TiDB Operator API Documentation and .

      To manage multiple TiDB clusters in a single Kubernetes cluster, you need to edit ./main.tf and add the tidb-cluster declaration based on your needs. For example:

      1. module "tidb-cluster-dev" {
      2. source = "../modules/aliyun/tidb-cluster"
      3. providers = {
      4. helm = helm.default
      5. }
      6. cluster_name = "dev-cluster"
      7. ack = module.tidb-operator
      8. tikv_count = 1
      9. tidb_count = 1
      10. }
      11. module "tidb-cluster-staging" {
      12. source = "../modules/aliyun/tidb-cluster"
      13. providers = {
      14. helm = helm.default
      15. }
      16. cluster_name = "staging-cluster"
      17. ack = module.tidb-operator
      18. pd_count = 3
      19. tikv_count = 3
      20. tidb_count = 2
      21. }

      Alibaba Cloud ACK - 图11Note

      You need to set a unique for each TiDB cluster.

      All the configurable parameters in tidb-cluster are as follows:

      Manage multiple Kubernetes clusters

      It is recommended to use a separate Terraform module to manage a specific Kubernetes cluster. (A Terraform module is a directory that contains the .tf script.)

      deploy/aliyun combines multiple reusable Terraform scripts in deploy/modules. To manage multiple clusters, perform the following operations in the root directory of the tidb-operator project:

      1. Create a directory for each cluster. For example:

        1. mkdir -p deploy/aliyun-staging
      2. Refer to main.tf in deploy/aliyun and write your own script. For example:

        1. provider "alicloud" {
        2. region = ${REGION}
        3. access_key = ${ACCESS_KEY}
        4. secret_key = ${SECRET_KEY}
        5. }
        6. module "tidb-operator" {
        7. source = "../modules/aliyun/tidb-operator"
        8. region = ${REGION}
        9. access_key = ${ACCESS_KEY}
        10. secret_key = ${SECRET_KEY}
        11. cluster_name = "example-cluster"
        12. key_file = "ssh-key.pem"
        13. kubeconfig_file = "kubeconfig"
        14. }
        15. provider "helm" {
        16. alias = "default"
        17. insecure = true
        18. install_tiller = false
        19. kubernetes {
        20. config_path = module.tidb-operator.kubeconfig_filename
        21. }
        22. }
        23. module "tidb-cluster" {
        24. source = "../modules/aliyun/tidb-cluster"
        25. providers = {
        26. helm = helm.default
        27. }
        28. cluster_name = "example-cluster"
        29. ack = module.tidb-operator
        30. }
        31. module "bastion" {
        32. source = "../modules/aliyun/bastion"
        33. bastion_name = "example-bastion"
        34. key_name = module.tidb-operator.key_name
        35. vpc_id = module.tidb-operator.vpc_id
        36. vswitch_id = module.tidb-operator.vswitch_ids[0]
        37. enable_ssh_to_worker = true
        38. worker_security_group_id = module.tidb-operator.security_group_id
        39. }

      You can customize this script. For example, you can remove the module "bastion" declaration if you do not need the bastion machine.

      Note

      You can copy the deploy/aliyun directory. But you cannot copy a directory on which the terraform apply operation is currently performed. In this case, it is recommended to clone the repository again and then copy it.

      Destroy

      1. Refer to Destroy a TiDB cluster to delete the cluster.

      2. Destroy the ACK cluster by running the following command:

        1. terraform destroy

      If the Kubernetes cluster is not successfully created, the destroy operation might return an error and fail. In such cases, manually remove the Kubernetes resources from the local state:

      1. terraform state rm module.ack.alicloud_cs_managed_kubernetes.k8s

      It may take a long time to finish destroying the cluster.

      Alibaba Cloud ACK - 图13Note

      You have to manually delete the cloud disk used by the components in the Alibaba Cloud console.

      Limitation

      You cannot change pod cidr, service cidr, and worker instance types once the cluster is created.