Backup and Restore Overview

    Dumpling is a data export tool, which exports data stored in TiDB or MySQL as SQL or CSV data files. You can use Dumpling to make a logical full backup or export.

    is a tool used for fast full data import into a TiDB cluster. TiDB Lightning supports Dumpling or CSV format data source. You can use TiDB Lightning to make a logical full data restore or import.

    BR is a command-line tool for distributed backup and restoration of the TiDB cluster data. Compared with Dumpling and Mydumper, BR is more suitable for huge data volumes. BR only supports TiDB v3.1 and later versions. For incremental backup insensitive to latency, refer to . For real-time incremental backup, refer to TiCDC.

    If you have the following backup needs, you can use BR to make a backup of your TiDB cluster data:

    • To back up a large volume of data at a fast speed
    • To get a direct backup of data as SST files (key-value pairs)
    • To perform incremental backup that is insensitive to latency

    Refer to the following documents for more information:

    If you have the following backup needs, you can use Dumpling to make a backup of the TiDB cluster data:

    • To export SQL or CSV files
    • To limit the memory usage of a single SQL statement
    • To export the historical data snapshot of TiDB

    Refer to the following documents for more information:

    To restore data from SQL or CSV files exported by Dumpling or other compatible data sources to a TiDB cluster, use TiDB Lightning. Refer to the following documents for more information:

    To make a backup of the TiDB cluster in Kubernetes, you need to create a object to describe the backup or create a BackupSchedule CR object to describe a scheduled backup.

    To restore data to the TiDB cluster in Kubernetes, you need to create a object to describe the restore.

    After creating the CR object, according to your configuration, TiDB Operator chooses the corresponding tool and performs the backup or restore.

    You can delete the CR or BackupSchedule CR by running the following commands:

    If you use TiDB Operator v1.1.2 or an earlier version, or if you use TiDB Operator v1.1.3 or a later version and set the value of spec.cleanPolicy to Delete, TiDB Operator cleans the backup data when it deletes the CR.

    In such cases, if you need to delete the namespace, it is recommended that you first delete all the Backup/BackupSchedule CRs and then delete the namespace.

    To address this issue, delete finalizers by running the following command:

    1. kubectl patch -n ${namespace} backup ${name} --type merge -p '{"metadata":{"finalizers":[]}}'

    For TiDB Operator v1.2.3 and earlier versions, TiDB Operator cleans the backup data by deleting the backup files one by one.

    For TiDB Operator v1.2.4 and later versions, TiDB Operator cleans the backup data by deleting the backup files in batches. For the batch deletion, the deletion methods are different depending on the type of backend storage used for backups.

    • For the S3-compatible backend storage, TiDB Operator uses the concurrent batch deletion method, which deletes files in batch concurrently. TiDB Operator starts multiple goroutines concurrently, and each goroutine uses the batch delete API “DeleteObjects” to delete multiple files.
    • For other types of backend storage, TiDB Operator uses the concurrent deletion method, which deletes files concurrently. TiDB Operator starts multiple goroutines, and each goroutine deletes one file at a time.

    For TiDB Operator v1.2.4 and later versions, you can configure the following fields in the Backup CR to control the clean behavior:

    • .spec.cleanOption.pageSize: Specifies the number of files to be deleted in each batch at a time. The default value is 10000.

    • : If the value of this field is true, TiDB Operator disables the concurrent batch deletion method and uses the concurrent deletion method.

      If your S3-compatible backend storage does not support the DeleteObjects API, the default concurrent batch deletion method fails. You need to configure this field to true to use the concurrent deletion method.

    • : Specifies the number of goroutines to start for the concurrent deletion method. The default value is 100.