Tips for troubleshooting TiDB in Kubernetes

    When a Pod is in the state, the containers in the Pod exit continually. As a result, you cannot use kubectl exec normally, making it inconvenient to diagnose issues.

    To solve this problem, TiDB Operator provides the Pod debug mode for PD, TiKV, and TiDB components. In this mode, the containers in the Pod hang directly after they are started, and will not repeatedly crash. Then you can use kubectl exec to connect to the Pod containers for diagnosis.

    To use the debug mode for troubleshooting:

    1. Add an annotation to the Pod to be diagnosed:

      When the container in the Pod is restarted again, it will detect this annotation and enter the debug mode.

      Note

      If Pod is running, you can force restart the container by running the following command.

      1. kubectl exec ${pod_name} -n ${namespace} -c ${container} -- kill -SIGTERM 1
    2. After finishing the diagnosis and resolving the problem, delete the Pod.

      1. kubectl delete pod ${pod_name} -n ${namespace}

    After the Pod is rebuilt, it automatically returns to the normal mode.

    Refer to the document and use SQL to online modify the configuration of a single TiKV instance.

    Troubleshooting Tips - 图2Note

    The modification made by this method is temporary and not persistent. After the Pod is restarted, the original configuration will be used.

    Modify manually in debug mode

    After the TiKV Pod enters debug mode, you can modify the TiKV configuration file and then manually start the TiKV process using the modified configuration file.

    The steps are as follows:

    1. Get the start command from the TiKV log, which will be used in a subsequent step.

      You can see a similar output as follows, which is the start command of TiKV.

      1. /tikv-server --pd=http://${tc_name}-pd:2379 --advertise-addr=${pod_name}.${tc_name}-tikv-peer.default.svc:20160 --addr=0.0.0.0:20160 --status-addr=0.0.0.0:20180 --data-dir=/var/lib/tikv --capacity=0 --config=/etc/tikv/tikv.toml

      Note

      If the TiKV Pod is in the CrashLoopBackoff state, you cannot get the start command from the log. In such cases, you might splice the start command according to the above command format.

    2. Turn on debug mode for the Pod and restart the Pod.

      Add an annotation to the Pod and wait for the Pod to restart.

        1. kubectl exec ${pod_name} -n ${namespace} -c tikv -- kill -SIGTERM 1

        Check the log of TiKV to ensure that the Pod is in debug mode.

        The output is similar to the following:

      1. Enter the TiKV container by running the following command:

        1. kubectl exec -it ${pod_name} -n ${namespace} -c tikv -- sh
      2. In the TiKV container, copy the configuration file of TiKV to a new file, and modify the new file.

        1. cp /etc/tikv/tikv.toml /tmp/tikv.toml && vi /tmp/tikv.tmol
      3. In the TiKV container, modify the start command obtained in Step 1 and configure the --config flag as the new configuration file. Run the modified start command to start the TiKV process:

        After the test is completed, if you want to recover the TiKV Pod, you can delete the TiKV Pod and wait for the Pod to be automatically started.

        1. kubectl delete ${pod_name} -n ${namespace}

        Normally, during TiKV rolling update, TiDB Operator evicts all Region leaders for TiKV Pods before restarting the TiKV Pods. This is meant for minimizing the impact of the rolling update on user requests.

        In some test scenarios, if you do not need to wait for the Region leader to migrate during TiKV rolling upgrade, or if you want to speed up the rolling upgrade, you can configure the field in the spec of TidbCluster to a small value.

        For more information about this field, refer to Configure graceful upgrade.

        Troubleshooting Tips - 图4Warning

        Configuring forceful upgrade causes some user requests to fail. It is not recommended for a production environment.