Deploy TiDB on Azure AKS

    To deploy TiDB Operator and the TiDB cluster in a self-managed Kubernetes environment, refer to Deploy TiDB Operator and .

    Before deploying a TiDB cluster on Azure AKS, perform the following operations:

    • Install Helm 3 for deploying TiDB Operator.

    • and install and configure .

    • Refer to use Ultra disks to create a new cluster that can use Ultra disks or enable Ultra disks in an exist cluster.

    • Acquire .

    • If the Kubernetes version of the cluster is earlier than 1.21, install aks-preview CLI extension for using Ultra Disks and register in your subscription.

      • Install the aks-preview CLI extension:

      • Register EnableAzureDiskFileCSIDriver:

        1. az feature register --name EnableAzureDiskFileCSIDriver --namespace Microsoft.ContainerService --subscription ${your-subscription-id}

    Create an AKS cluster and a node pool

    Most of the TiDB cluster components use Azure disk as storage. According to AKS Best Practices, when creating an AKS cluster, it is recommended to ensure that each node pool uses one availability zone (at least 3 in total).

    To create an AKS cluster with , run the following command:

    Note

    If the Kubernetes version of the cluster is earlier than 1.21, you need to append an --aks-custom-headers flag to enable the EnableAzureDiskFileCSIDriver feature by running the following command:

    1. # create AKS cluster
    2. az aks create \
    3. --resource-group ${resourceGroup} \
    4. --name ${clusterName} \
    5. --location ${location} \
    6. --generate-ssh-keys \
    7. --vm-set-type VirtualMachineScaleSets \
    8. --load-balancer-sku standard \
    9. --node-count 3 \
    10. --zones 1 2 3 \
    11. --aks-custom-headers EnableAzureDiskFileCSIDriver=true

    Create component node pools

    After creating an AKS cluster, run the following commands to create component node pools. Each node pool may take two to five minutes to create. It is recommended to enable in the TiKV node pool. For more details about cluster configuration, refer to az aks documentation and .

    1. To create a TiDB Operator and monitor pool:

      1. az aks nodepool add --name admin \
      2. --cluster-name ${clusterName} \
      3. --resource-group ${resourceGroup} \
      4. --zones 1 2 3 \
      5. --aks-custom-headers EnableAzureDiskFileCSIDriver=true \
      6. --node-count 1 \
      7. --labels dedicated=admin
    2. Create a PD node pool with nodeType being Standard_F4s_v2 or higher:

      1. az aks nodepool add --name pd \
      2. --cluster-name ${clusterName} \
      3. --resource-group ${resourceGroup} \
      4. --node-vm-size ${nodeType} \
      5. --zones 1 2 3 \
      6. --aks-custom-headers EnableAzureDiskFileCSIDriver=true \
      7. --node-count 3 \
      8. --labels dedicated=pd \
      9. --node-taints dedicated=pd:NoSchedule
    3. Create a TiDB node pool with nodeType being Standard_F8s_v2 or higher. You can set --node-count to 2 because only two TiDB nodes are required by default. You can also scale out this node pool by modifying this parameter at any time if necessary.

      1. az aks nodepool add --name tidb \
      2. --cluster-name ${clusterName} \
      3. --resource-group ${resourceGroup} \
      4. --node-vm-size ${nodeType} \
      5. --zones 1 2 3 \
      6. --aks-custom-headers EnableAzureDiskFileCSIDriver=true \
      7. --node-count 2 \
      8. --labels dedicated=tidb \
      9. --node-taints dedicated=tidb:NoSchedule
    4. Create a TiKV node pool with nodeType being Standard_E8s_v4 or higher:

      1. az aks nodepool add --name tikv \
      2. --cluster-name ${clusterName} \
      3. --resource-group ${resourceGroup} \
      4. --node-vm-size ${nodeType} \
      5. --zones 1 2 3 \
      6. --aks-custom-headers EnableAzureDiskFileCSIDriver=true \
      7. --node-count 3 \
      8. --labels dedicated=tikv \
      9. --node-taints dedicated=tikv:NoSchedule \

    Deploy component node pools in availability zones

    The Azure AKS cluster deploys nodes across multiple zones using “best effort zone balance”. If you want to apply “strict zone balance” (not supported in AKS now), you can deploy one node pool in one zone. For example:

    1. Create TiKV node pool 1 in zone 1:

      1. az aks nodepool add --name tikv1 \
      2. --cluster-name ${clusterName} \
      3. --resource-group ${resourceGroup} \
      4. --node-vm-size ${nodeType} \
      5. --zones 1 \
      6. --aks-custom-headers EnableAzureDiskFileCSIDriver=true \
      7. --node-count 1 \
      8. --labels dedicated=tikv \
      9. --node-taints dedicated=tikv:NoSchedule \
      10. --enable-ultra-ssd
    2. Create TiKV node pool 2 in zone 2:

      1. az aks nodepool add --name tikv2 \
      2. --cluster-name ${clusterName} \
      3. --resource-group ${resourceGroup} \
      4. --node-vm-size ${nodeType} \
      5. --zones 2 \
      6. --aks-custom-headers EnableAzureDiskFileCSIDriver=true \
      7. --node-count 1 \
      8. --labels dedicated=tikv \
      9. --node-taints dedicated=tikv:NoSchedule \
      10. --enable-ultra-ssd
    3. Create TiKV node pool 3 in zone 3:

      1. az aks nodepool add --name tikv3 \
      2. --cluster-name ${clusterName} \
      3. --resource-group ${resourceGroup} \
      4. --node-vm-size ${nodeType} \
      5. --zones 3 \
      6. --aks-custom-headers EnableAzureDiskFileCSIDriver=true \
      7. --node-count 1 \
      8. --labels dedicated=tikv \
      9. --node-taints dedicated=tikv:NoSchedule \

    Azure AKS - 图2Warning

    About node pool scale-in:

    Configure StorageClass

    To improve disk IO performance, it is recommended to add mountOptions in StorageClass to configure nodelalloc and noatime. Refer to Mount the data disk ext4 filesystem with options on the target machines that deploy TiKV.

    Deploy TiDB Operator

    Deploy TiDB Operator in the AKS cluster by referring to Deploy TiDB Operator section.

    This section describes how to deploy a TiDB cluster and its monitoring component in Azure AKS.

    Create namespace

    To create a namespace to deploy the TiDB cluster, run the following command:

    1. kubectl create namespace tidb-cluster

    Note

    A namespace is a virtual cluster backed by the same physical cluster. This document takes tidb-cluster as an example. If you want to use other namespaces, modify the corresponding arguments of -n or --namespace.

    First, download the sample TidbCluster and TidbMonitor configuration files:

    1. curl -O https://raw.githubusercontent.com/pingcap/tidb-operator/master/examples/aks/tidb-cluster.yaml && \
    2. curl -O https://raw.githubusercontent.com/pingcap/tidb-operator/master/examples/aks/tidb-monitor.yaml

    Azure AKS - 图4Note

    By default, TiDB LoadBalancer in tidb-cluster.yaml is set to “internal”, indicating that the LoadBalancer is only accessible within the cluster virtual network, not externally. To access TiDB over the MySQL protocol, you need to use a bastion to access the internal host of the cluster or use kubectl port-forward. You can delete the “internal” schema in the tidb-cluster.yaml file to expose the LoadBalancer publicly by default. However, notice that this practice may expose TiDB to risks.

    To deploy the TidbCluster and TidbMonitor CR in the AKS cluster, run the following command:

    1. kubectl apply -f tidb-cluster.yaml -n tidb-cluster && \
    2. kubectl apply -f tidb-monitor.yaml -n tidb-cluster

    After the yaml file above is applied to the Kubernetes cluster, TiDB Operator creates the desired TiDB cluster and its monitoring component according to the yaml file.

    View the cluster status

    To view the status of the TiDB cluster, run the following command:

    1. kubectl get pods -n tidb-cluster

    When all the pods are in the Running or Ready state, the TiDB cluster is successfully started. For example:

    1. NAME READY STATUS RESTARTS AGE
    2. tidb-discovery-5cb8474d89-n8cxk 1/1 Running 0 47h
    3. tidb-monitor-6fbcc68669-dsjlc 3/3 Running 0 47h
    4. tidb-pd-0 1/1 Running 0 47h
    5. tidb-pd-1 1/1 Running 0 46h
    6. tidb-pd-2 1/1 Running 0 46h
    7. tidb-tidb-0 2/2 Running 0 47h
    8. tidb-tidb-1 2/2 Running 0 46h
    9. tidb-tikv-0 1/1 Running 0 47h
    10. tidb-tikv-1 1/1 Running 0 47h
    11. tidb-tikv-2 1/1 Running 0 47h

    Access the database

    After deploying a TiDB cluster, you can access the TiDB database to test or develop applications.

    Access method

    • Access via Bastion

    The LoadBalancer created for your TiDB cluster resides in an intranet. You can create a Bastion in the cluster virtual network to connect to an internal host and then access the database.

    Note

    In addition to the bastion host, you can also connect an existing host to the cluster virtual network by . If the AKS cluster is created in an existing virtual network, you can use hosts in this virtual network to access the database.

    • Access via SSH

    You can create the SSH connection to a Linux node to access the database.

    • Access via node-shell

    You can simply use tools like to connect to nodes in the cluster, then access the database.

    Access via the MySQL client

    After access to the internal host via SSH, you can access the TiDB cluster through the MySQL client.

    1. Install the MySQL client on the host:

      1. sudo yum install mysql -y
    2. Connect the client to the TiDB cluster:

      1. mysql --comments -h ${tidb-lb-ip} -P 4000 -u root

      ${tidb-lb-ip} is the LoadBalancer IP address of the TiDB service. To obtain it, run the kubectl get svc basic-tidb -n tidb-cluster command. The EXTERNAL-IP field returned is the IP address.

      For example:

      1. $ mysql --comments -h 20.240.0.7 -P 4000 -u root
      2. Welcome to the MariaDB monitor. Commands end with ; or \g.
      3. Your MySQL connection id is 1189
      4. Server version: 5.7.25-TiDB-v4.0.2 TiDB Server (Apache License 2.0) Community Edition, MySQL 5.7 compatible
      5. <!-- Copy -->right (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
      6. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
      7. MySQL [(none)]> show status;
      8. +--------------------+--------------------------------------+
      9. | Variable_name | Value |
      10. +--------------------+--------------------------------------+
      11. | Ssl_cipher | |
      12. | Ssl_cipher_list | |
      13. | Ssl_verify_mode | 0 |
      14. | Ssl_version | |
      15. | ddl_schema_version | 22 |
      16. | server_id | ed4ba88b-436a-424d-9087-977e897cf5ec |
      17. +--------------------+--------------------------------------+
      18. 6 rows in set (0.00 sec)

    Azure AKS - 图6Note

    • is updated from mysql_native_password to caching_sha2_password. Therefore, if you access the TiDB service (earlier than v4.0.7) by using MySQL 8.0 client via password authentication, you need to specify the --default-auth=mysql_native_password parameter.
    • By default, TiDB (starting from v4.0.2) periodically shares usage details with PingCAP to help understand how to improve the product. For details about what is shared and how to disable the sharing, see Telemetry.

    Access the Grafana monitoring dashboard

    Obtain the LoadBalancer IP address of Grafana:

    1. kubectl -n tidb-cluster get svc basic-grafana

    For example:

    In the output above, the EXTERNAL-IP column is the LoadBalancer IP address.

    You can access the ${grafana-lb}:3000 address using your web browser to view monitoring metrics. Replace ${grafana-lb} with the LoadBalancer IP address.

    Note

    The default Grafana username and password are both admin.

    Access TiDB Dashboard

    See for instructions about how to securely allow access to TiDB Dashboard.

    To upgrade the TiDB cluster, execute the following command:

    1. kubectl patch tc basic -n tidb-cluster --type merge -p '{"spec":{"version":"${version}"}}`.

    The upgrade process does not finish immediately. You can view the upgrade progress by running the kubectl get pods -n tidb-cluster --watch command.

    Scale out

    Before scaling out the cluster, you need to scale out the corresponding node pool so that the new instances have enough resources for operation.

    This section describes how to scale out the AKS node pool and TiDB components.

    When scaling out TiKV, the node pools must be scaled out evenly among availability zones. The following example shows how to scale out the TiKV node pool of the ${clusterName} cluster to 6 nodes:

    1. --resource-group ${resourceGroup} \
    2. --cluster-name ${clusterName} \
    3. --name ${nodePoolName} \
    4. --node-count 6

    For more information on node pool management, refer to .

    Scale out TiDB components

    After scaling out the AKS node pool, run the kubectl edit tc basic -n tidb-cluster command with replicas of each component set to desired value. The scaling-out process is then completed.

    Deploy TiFlash/TiCDC

    TiFlash is the columnar storage extension of TiKV.

    The two components are not required in the deployment. This section shows a quick start example.

    Add node pools

    Add a node pool for TiFlash/TiCDC respectively. You can set --node-count as required.

    1. Create a TiFlash node pool with nodeType being Standard_E8s_v4 or higher:

      1. az aks nodepool add --name tiflash \
      2. --cluster-name ${clusterName} \
      3. --resource-group ${resourceGroup} \
      4. --node-vm-size ${nodeType} \
      5. --zones 1 2 3 \
      6. --aks-custom-headers EnableAzureDiskFileCSIDriver=true \
      7. --node-count 3 \
      8. --node-taints dedicated=tiflash:NoSchedule
    2. Create a TiCDC node pool with nodeType being Standard_E16s_v4 or higher:

      1. az aks nodepool add --name ticdc \
      2. --cluster-name ${clusterName} \
      3. --resource-group ${resourceGroup} \
      4. --node-vm-size ${nodeType} \
      5. --zones 1 2 3 \
      6. --aks-custom-headers EnableAzureDiskFileCSIDriver=true \
      7. --node-count 3 \
      8. --labels dedicated=ticdc \
      9. --node-taints dedicated=ticdc:NoSchedule

    Configure and deploy

    • To deploy TiFlash, configure spec.tiflash in tidb-cluster.yaml. The following is an example:

      1. spec:
      2. ...
      3. tiflash:
      4. baseImage: pingcap/tiflash
      5. maxFailoverCount: 0
      6. replicas: 1
      7. storageClaims:
      8. - resources:
      9. requests:
      10. storage: 100Gi
      11. tolerations:
      12. - effect: NoSchedule
      13. key: dedicated
      14. operator: Equal
      15. value: tiflash

      For other parameters, refer to .

      Azure AKS - 图8Warning

      TiDB Operator automatically mounts PVs in the order of the configuration in the storageClaims list. Therefore, if you need to add disks for TiFlash, make sure that you add the disks only to the end of the original configuration in the list. In addition, you must not alter the order of the original configuration.

    • To deploy TiCDC, configure spec.ticdc in tidb-cluster.yaml. The following is an example:

      1. spec:
      2. ...
      3. ticdc:
      4. baseImage: pingcap/ticdc
      5. replicas: 1
      6. tolerations:
      7. - effect: NoSchedule
      8. key: dedicated
      9. operator: Equal
      10. value: ticdc

      Modify replicas as required.

    Finally, run the kubectl -n tidb-cluster apply -f tidb-cluster.yaml command to update the TiDB cluster configuration.

    For detailed CR configuration, refer to API references and .

    Deploy TiDB Enterprise Edition

    To deploy TiDB/PD/TiKV/TiFlash/TiCDC Enterprise Edition, configure spec.[tidb|pd|tikv|tiflash|ticdc].baseImage in tidb-cluster.yaml as the enterprise image. The enterprise image format is pingcap/[tidb|pd|tikv|tiflash|ticdc]-enterprise.

    For example:

    1. spec:
    2. ...
    3. pd:
    4. baseImage: pingcap/pd-enterprise
    5. ...
    6. tikv:
    7. baseImage: pingcap/tikv-enterprise

    Azure disks support multiple volume types. Among them, UltraSSD delivers low latency and high throughput and can be enabled by performing the following steps:

    1. and create a storage class for UltraSSD:

      1. apiVersion: storage.k8s.io/v1
      2. kind: StorageClass
      3. metadata:
      4. name: ultra
      5. provisioner: disk.csi.azure.com
      6. parameters:
      7. skuname: UltraSSD_LRS # alias: storageaccounttype, available values: Standard_LRS, Premium_LRS, StandardSSD_LRS, UltraSSD_LRS
      8. cachingMode: None
      9. reclaimPolicy: Delete
      10. allowVolumeExpansion: true
      11. volumeBindingMode: WaitForFirstConsumer
      12. mountOptions:
      13. - nodelalloc,noatime

      You can add more Driver Parameters as required.

    2. In tidb-cluster.yaml, specify the ultra storage class to apply for the UltraSSD volume type through the storageClassName field.

      The following is a TiKV configuration example you can refer to:

      1. spec:
      2. tikv:
      3. ...
      4. storageClassName: ultra

    You can use any supported Azure disk type. It is recommended to use Premium_LRS or UltraSSD_LRS.

    For more information about the storage class configuration and Azure disk types, refer to and Azure Disk Types.

    Use local storage

    Use Azure LRS disks for storage in production environment. To simulate bare-metal performance, use additional NVMe SSD local store volumes provided by some Azure instances. You can choose such instances for the TiKV node pool to achieve higher IOPS and lower latency.

    Note

    • You cannot dynamically change the storage class of a running TiDB cluster. In this case, create a new cluster for testing.
    • Local NVMe Disks are ephemeral. Data will be lost on these disks if you stop/deallocate your node. When the node is reconstructed, you need to migrate data in TiKV. If you do not want to migrate data, it is recommended not to use the local disk in a production environment.

    For instance types that provide local disks, refer to . The following takes Standard_L8s_v2 as an example:

    1. Create a node pool with local storage for TiKV.

      Modify the instance type of the TiKV node pool in the az aks nodepool add command to Standard_L8s_v2:

      If the TiKV node pool already exists, you can either delete the old group and then create a new one, or change the group name to avoid conflict.

    2. Deploy the local volume provisioner.

      You need to use the local-volume-provisioner to discover and manage the local storage. Run the following command to deploy and create a local-storage storage class:

      1. kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.3.2/manifests/eks/local-volume-provisioner.yaml
    3. Use local storage.

      After the steps above, the local volume provisioner can discover all the local NVMe SSD disks in the cluster.

      Add the tikv.storageClassName field to the tidb-cluster.yaml file and set the value of the field to local-storage.

      For more information, refer to