Parallel Backup with gpbackup and gprestore

    The backup metadata files contain all of the information that gprestore needs to restore a full backup set in parallel. Backup metadata also provides the framework for restoring only individual objects in the data set, along with any dependent objects, in future versions of gprestore. (See Understanding Backup Files for more information.) Storing the table data in CSV files also provides opportunities for using other restore utilities, such as gpload, to load the data either in the same cluster or another cluster. By default, one file is created for each table on the segment. You can specify the --leaf-partition-data option with gpbackup to create one data file per leaf partition of a partitioned table, instead of a single file. This option also enables you to filter backup sets by leaf partitions.

    Each gpbackup task uses a single transaction in Greenplum Database. During this transaction, metadata is backed up on the master host, and data for each table on each segment host is written to CSV backup files using COPY ... ON SEGMENT commands in parallel. The backup process acquires an ACCESS SHARE lock on each table that is backed up.

    For information about the gpbackup and gprestore utility options, see and gprestore.

    Parent topic:

    The gpbackup and gprestore utilities are available with Greenplum Database 5.5.0 and later.

    gpbackup and gprestore have the following limitations:

    • If you create an index on a parent partitioned table, gpbackup does not back up that same index on child partitioned tables of the parent, as creating the same index on a child would cause an error. However, if you exchange a partition, gpbackup does not detect that the index on the exchanged partition is inherited from the new parent table. In this case, gpbackup backs up conflicting CREATE INDEX statements, which causes an error when you restore the backup set.

    • You can execute multiple instances of gpbackup, but each execution requires a distinct timestamp.

    • Database object filtering is currently limited to schemas and tables.

    • If you use the gpbackup --single-data-file option to combine table backups into a single file per segment, you cannot perform a parallel restore operation with gprestore (cannot set --jobs to a value higher than 1).

    • You cannot use the --exclude-table-file with --leaf-partition-data. Although you can specify leaf partition names in a file specified with --exclude-table-file, gpbackup ignores the partition names.

    • Backing up a database with gpbackup while simultaneously running DDL commands might cause gpbackup to fail, in order to ensure consistency within the backup set. For example, if a table is dropped after the start of the backup operation, gpbackup exits and displays the error message ERROR: relation <schema.table> does not exist.

      gpbackup might fail when a table is dropped during a backup operation due to table locking issues. gpbackup generates a list of tables to back up and acquires an ACCESS SHARED lock on the tables. If an EXCLUSIVE LOCK is held on a table, gpbackup acquires the ACCESS SHARED lock after the existing lock is released. If the table no longer exists when gpbackup attempts to acquire a lock on the table, gpbackup exits with the error message.

      For tables that might be dropped during a backup, you can exclude the tables from a backup with a gpbackup table filtering option such as --exclude-table or --exclude-schema.

    Parent topic:

    Objects Included in a Backup or Restore

    The following table lists the objects that are backed up and restored with gpbackup and gprestore.

    Database objects are backed up for the database you specify with the --dbname option.

    Global objects (Greenplum Database system objects) are also backed up by default, but they are restored only if you include the --with-globals option to gprestore.

    Note: These schemas are not included in a backup.

    • gp_toolkit
    • information_schema
    • pg_aoseg
    • pg_bitmapindex
    • pg_catalog
    • pg_toast*
    • pg_temp*

    When restoring to an existing database, gprestore assumes the public schema exists when restoring objects to the public schema. When restoring to a new database (with the --create-db option), gprestore creates the public schema automatically when creating a database with the CREATE DATABASE command. The command uses the template0 database that contains the public schema.

    See also .

    To perform a complete backup of a database, as well as Greenplum Database system metadata, use the command:

    For example:

    1. $ gpbackup --dbname demo
    2. 20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Starting backup of database demo
    3. 20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Backup Timestamp = 20180105112754
    4. 20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Backup Database = demo
    5. 20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Backup Type = Unfiltered Compressed Full Backup
    6. 20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Gathering list of tables for backup
    7. 20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Acquiring ACCESS SHARE locks on tables
    8. Locks acquired: 6 / 6 [================================================================] 100.00% 0s
    9. 20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Gathering additional table metadata
    10. 20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Writing global database metadata
    11. 20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Global database metadata backup complete
    12. 20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Writing pre-data metadata
    13. 20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Pre-data metadata backup complete
    14. 20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Writing post-data metadata
    15. 20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Post-data metadata backup complete
    16. 20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Writing data to file
    17. Tables backed up: 3 / 3 [==============================================================] 100.00% 0s
    18. 20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Data backup complete
    19. 20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Found neither /usr/local/greenplum-db/./bin/gp_email_contacts.yaml nor /home/gpadmin/gp_email_contacts.yaml
    20. 20180105:11:27:54 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Email containing gpbackup report /gpmaster/seg-1/backups/20180105/20180105112754/gpbackup_20180105112754_report will not be sent
    21. 20180105:11:27:55 gpbackup:gpadmin:centos6.localdomain:002182-[INFO]:-Backup completed successfully

    The above command creates a file that contains global and database-specific metadata on the Greenplum Database master host in the default directory, $MASTER_DATA_DIRECTORY/backups/<YYYYMMDD>/<YYYYMMDDHHMMSS>/. For example:

    1. $ ls /gpmaster/gpsne-1/backups/20180105/20180105112754
    2. gpbackup_20180105112754_config.yaml gpbackup_20180105112754_report
    3. gpbackup_20180105112754_metadata.sql gpbackup_20180105112754_toc.yaml

    By default, each segment stores each table’s data for the backup in a separate compressed CSV file in <seg_dir>/backups/<YYYYMMDD>/<YYYYMMDDHHMMSS>/:

    1. $ ls /gpdata1/gpsne0/backups/20180105/20180105112754/
    2. gpbackup_0_20180105112754_17166.gz gpbackup_0_20180105112754_26303.gz
    3. gpbackup_0_20180105112754_21816.gz

    To consolidate all backup files into a single directory, include the --backup-dir option. Note that you must specify an absolute path with this option:

    1. $ gpbackup --dbname demo --backup-dir /home/gpadmin/backups
    2. ...
    3. 20171103:15:31:58 gpbackup:gpadmin:0ee2f5fb02c9:017586-[INFO]:-Backup completed successfully
    4. $ find /home/gpadmin/backups/ -type f
    5. /home/gpadmin/backups/gpseg0/backups/20171103/20171103153156/gpbackup_0_20171103153156_16543.gz
    6. /home/gpadmin/backups/gpseg0/backups/20171103/20171103153156/gpbackup_0_20171103153156_16524.gz
    7. /home/gpadmin/backups/gpseg1/backups/20171103/20171103153156/gpbackup_1_20171103153156_16543.gz
    8. /home/gpadmin/backups/gpseg1/backups/20171103/20171103153156/gpbackup_1_20171103153156_16524.gz
    9. /home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_config.yaml
    10. /home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_predata.sql
    11. /home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_global.sql
    12. /home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_postdata.sql
    13. /home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_report
    14. /home/gpadmin/backups/gpseg-1/backups/20171103/20171103153156/gpbackup_20171103153156_toc.yaml

    When performing a backup operation, you can use the --single-data-file in situations where the additional overhead of multiple files might be prohibitive. For example, if you use a third party storage solution such as Data Domain with back ups.

    To use gprestore to restore from a backup set, you must use the --timestamp option to specify the exact timestamp value (YYYYMMDDHHMMSS) to restore. Include the --create-db option if the database does not exist in the cluster. For example:

    1. $ gprestore --timestamp 20171103152558 --create-db
    2. 20171103:15:45:30 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Restore Key = 20171103152558
    3. 20171103:15:45:31 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Creating database
    4. 20171103:15:45:44 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Database creation complete
    5. 20171103:15:45:44 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Restoring pre-data metadata from /gpmaster/gpsne-1/backups/20171103/20171103152558/gpbackup_20171103152558_predata.sql
    6. 20171103:15:45:45 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Pre-data metadata restore complete
    7. 20171103:15:45:45 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Restoring data
    8. 20171103:15:45:45 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Data restore complete
    9. 20171103:15:45:45 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Restoring post-data metadata from /gpmaster/gpsne-1/backups/20171103/20171103152558/gpbackup_20171103152558_postdata.sql
    10. 20171103:15:45:45 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Post-data metadata restore complete

    If you specified a custom --backup-dir to consolidate the backup files, include the same --backup-dir option when using gprestore to locate the backup files:

    1. $ dropdb demo
    2. $ gprestore --backup-dir /home/gpadmin/backups/ --timestamp 20171103153156 --create-db
    3. 20171103:15:51:02 gprestore:gpadmin:0ee2f5fb02c9:017819-[INFO]:-Restore Key = 20171103153156
    4. ...
    5. 20171103:15:51:17 gprestore:gpadmin:0ee2f5fb02c9:017819-[INFO]:-Post-data metadata restore complete

    gprestore does not attempt to restore global metadata for the Greenplum System by default. If this is required, include the --with-globals argument.

    By default, gprestore uses 1 connection to restore table data and metadata. If you have a large backup set, you can improve performance of the restore by increasing the number of parallel connections with the --jobs option. For example:

    1. $ gprestore --backup-dir /home/gpadmin/backups/ --timestamp 20171103153156 --create-db --jobs 8

    Test the number of parallel connections with your backup set to determine the ideal number for fast data recovery.

    Note: You cannot perform a parallel restore operation with gprestore if the backup combined table backups into a single file per segment with the gpbackup option --single-data-file.

    Report Files

    When performing a backup or restore operation, gpbackup and gprestore generate a report file. When email notification is configured, the email sent contains the contents of the report file. For information about email notification, see .

    The report file is placed in the Greenplum Database master backup directory. The report file name contains the timestamp of the operation. These are the formats of the gpbackup and gprestore report file names.

    For these example report file names, 20180213114446 is the timestamp of the backup and 20180213115426 is the timestamp of the restore operation.

    1. gpbackup_20180213114446_report
    2. gprestore_20180213114446_20180213115426_report

    This backup directory on a Greenplum Database master host contains both a gpbackup and gprestore report file.

    1. $ ls -l /gpmaster/seg-1/backups/20180213/20180213114446
    2. total 36
    3. -r--r--r--. 1 gpadmin gpadmin 295 Feb 13 11:44 gpbackup_20180213114446_config.yaml
    4. -r--r--r--. 1 gpadmin gpadmin 1855 Feb 13 11:44 gpbackup_20180213114446_metadata.sql
    5. -r--r--r--. 1 gpadmin gpadmin 1402 Feb 13 11:44 gpbackup_20180213114446_report
    6. -r--r--r--. 1 gpadmin gpadmin 2199 Feb 13 11:44 gpbackup_20180213114446_toc.yaml
    7. -r--r--r--. 1 gpadmin gpadmin 404 Feb 13 11:54 gprestore_20180213114446_20180213115426_report

    The contents of the report files are similar. This is an example of the contents of a gprestore report file.

    1. Greenplum Database Restore Report
    2. Timestamp Key: 20180213114446
    3. GPDB Version: 5.4.1+dev.8.g9f83645 build commit:9f836456b00f855959d52749d5790ed1c6efc042
    4. gprestore Version: 1.0.0-alpha.3+dev.73.g0406681
    5. Database Name: test
    6. Command Line: gprestore --timestamp 20180213114446 --with-globals --createdb
    7. Start Time: 2018-02-13 11:54:26
    8. End Time: 2018-02-13 11:54:31
    9. Duration: 0:00:05
    10. Restore Status: Success

    When performing a backup operation, gpbackup appends backup information in the gpbackup history file, gpbackup_history.yaml, in the Greenplum Database master data directory. The file contains the backup timestamp, information about the backup options, and backup set information for incremental backups. This file is not backed up by gpbackup.

    gpbackup uses the information in the file to find a matching backup for an incremental backup when you run gpbackup with the --incremental option and do not specify the --from-timesamp option to indicate the backup that you want to use as the latest backup in the incremental backup set. For information about incremental backups, see .

    Return Codes

    One of these codes is returned after gpbackup or gprestore completes.

    • 0 – Backup or restore completed with no problems
    • 1 – Backup or restore completed with non-fatal errors. See log file for more information.
    • 2 – Backup or restore failed with a fatal error. See log file for more information.

    Parent topic:

    Filtering the Contents of a Backup or Restore

    gpbackup backs up all schemas and tables in the specified database, unless you exclude or include individual schema or table objects with schema level or table level filter options.

    The schema level options are --include-schema or --exclude-schema command-line options to gpbackup. For example, if the “demo” database includes only two schemas, “wikipedia” and “twitter,” both of the following commands back up only the “wikipedia” schema:

    1. $ gpbackup --dbname demo --include-schema wikipedia
    2. $ gpbackup --dbname demo --exclude-schema twitter

    You can include multiple --include-schema options in a gpbackup or multiple --exclude-schema options. For example:

    1. $ gpbackup --dbname demo --include-schema wikipedia --include-schema twitter

    To filter the individual tables that are included in a backup set, or excluded from a backup set, specify individual tables with the --include-table option or the --exclude-table option. The table must be schema qualified, <schema-name>.<table-name>. The individual table filtering options can be specified multiple times. However, --include-table and --exclude-table cannot both be used in the same command.

    You can create a list of qualified table names in a text file. When listing tables in a file, each line in the text file must define a single table using the format <schema-name>.<table-name>. The file must not include trailing lines. For example:

    1. wikipedia.articles
    2. twitter.message

    If a table or schema name uses any character other than a lowercase letter, number, or an underscore character, then you must include that name in double quotes. For example:

    1. beer."IPA"
    2. "Wine".riesling
    3. "Wine"."sauvignon blanc"
    4. water.tonic

    After creating the file, you can use it either to include or exclude tables with the gpbackup options --include-table-file or --exclude-table-file. For example:

    You can combine -include schema with --exclude-table or --exclude-table-file for a backup. This example uses --include-schema with --exclude-table to back up a schema except for a single table.

    1. $ gpbackup --dbname demo --include-schema mydata --exclude-table mydata.addresses

    You cannot combine --include-schema with --include-table or --include-table-file, and you cannot combine --exclude-schema with any table filtering option such as --exclude-table or --include-table.

    When you use --include-table or --include-table-file dependent objects are not automatically backed up or restored, you must explicitly specify the dependent objects that are required. For example, if you back up or restore a view, you must also specify the tables that the view uses. If you backup or restore a table that uses a sequence, you must also specify the sequence.

    1. demo=# **CREATE TABLE sales \(id int, date date, amt decimal\(10,2\)\)
    2. PARTITION BY RANGE \(date\)
    3. \( PARTITION Jan17 START \(date '2017-01-01'\) INCLUSIVE ,
    4. PARTITION Feb17 START \(date '2017-02-01'\) INCLUSIVE ,
    5. PARTITION Mar17 START \(date '2017-03-01'\) INCLUSIVE ,
    6. PARTITION Apr17 START \(date '2017-04-01'\) INCLUSIVE ,
    7. PARTITION May17 START \(date '2017-05-01'\) INCLUSIVE ,
    8. PARTITION Jun17 START \(date '2017-06-01'\) INCLUSIVE ,
    9. PARTITION Jul17 START \(date '2017-07-01'\) INCLUSIVE ,
    10. PARTITION Sep17 START \(date '2017-09-01'\) INCLUSIVE ,
    11. PARTITION Oct17 START \(date '2017-10-01'\) INCLUSIVE ,
    12. PARTITION Nov17 START \(date '2017-11-01'\) INCLUSIVE ,
    13. PARTITION Dec17 START \(date '2017-12-01'\) INCLUSIVE
    14. END \(date '2018-01-01'\) EXCLUSIVE \);**
    15. NOTICE: CREATE TABLE will create partition "sales_1_prt_jan17" for table "sales"
    16. NOTICE: CREATE TABLE will create partition "sales_1_prt_feb17" for table "sales"
    17. NOTICE: CREATE TABLE will create partition "sales_1_prt_mar17" for table "sales"
    18. NOTICE: CREATE TABLE will create partition "sales_1_prt_apr17" for table "sales"
    19. NOTICE: CREATE TABLE will create partition "sales_1_prt_may17" for table "sales"
    20. NOTICE: CREATE TABLE will create partition "sales_1_prt_jun17" for table "sales"
    21. NOTICE: CREATE TABLE will create partition "sales_1_prt_jul17" for table "sales"
    22. NOTICE: CREATE TABLE will create partition "sales_1_prt_aug17" for table "sales"
    23. NOTICE: CREATE TABLE will create partition "sales_1_prt_sep17" for table "sales"
    24. NOTICE: CREATE TABLE will create partition "sales_1_prt_oct17" for table "sales"
    25. NOTICE: CREATE TABLE will create partition "sales_1_prt_nov17" for table "sales"
    26. NOTICE: CREATE TABLE will create partition "sales_1_prt_dec17" for table "sales"
    27. CREATE TABLE

    To back up only data for the last quarter of the year, first create a text file that lists those leaf partition names instead of the full table name:

    1. public.sales_1_prt_oct17
    2. public.sales_1_prt_nov17
    3. public.sales_1_prt_dec17

    Then specify the file with the --include-table-file option to generate one data file per leaf partition:

    1. $ gpbackup --dbname demo --include-table-file last-quarter.txt --leaf-partition-data

    When you specify --leaf-partition-data, gpbackup generates one data file per leaf partition when backing up a partitioned table. For example, this command generates one data file for each leaf partition:

    1. $ gpbackup --dbname demo --include-table public.sales --leaf-partition-data

    When leaf partitions are backed up, the leaf partition data is backed up along with the metadata for the entire partitioned table.

    Note: You cannot use the --exclude-table-file option with --leaf-partition-data. Although you can specify leaf partition names in a file specified with --exclude-table-file, gpbackup ignores the partition names.

    Filtering with gprestore

    After creating a backup set with gpbackup, you can filter the schemas and tables that you want to restore from the backup set using the gprestore --include-schema and --include-table-file options. These options work in the same way as their gpbackup counterparts, but have the following restrictions:

    • The tables that you attempt to restore must not already exist in the database.

    • If you attempt to restore a schema or table that does not exist in the backup set, the gprestore does not execute.

    • If you use the --include-schema option, gprestore cannot restore objects that have dependencies on multiple schemas.

    • If you use the --include-table-file option, gprestore does not create roles or set the owner of the tables. The utility restores table indexes and rules. Triggers are also restored but are not supported in Greenplum Database.

    • The file that you specify with --include-table-file cannot include a leaf partition name, as it can when you specify this option with gpbackup. If you specified leaf partitions in the backup set, specify the partitioned table to restore the leaf partition data.

      When restoring a backup set that contains data from some leaf partitions of a partitioned table, the partitioned table is restored along with the data for the leaf partitions. For example, you create a backup with the gpbackup option --include-table-file and the text file lists some leaf partitions of a partitioned table. Restoring the backup creates the partitioned table and restores the data only for the leaf partitions listed in the file.

    Parent topic:

    gpbackup and gprestore can send email notifications after a back up or restore operation completes.

    To have gpbackup or gprestore send out status email notifications, you must place a file named gp_email_contacts.yaml in the home directory of the user running gpbackup or gprestore in the same directory as the utilities ($GPHOME/bin). A utility issues a message if it cannot locate a gp_email_contacts.yaml file in either location. If both locations contain a .yaml file, the utility uses the file in user $HOME.

    The email subject line includes the utility name, timestamp, status, and the name of the Greenplum Database master. This is an example subject line for a gpbackup email.

    1. gpbackup 20180202133601 on gp-master completed

    The email contains summary information about the operation including options, duration, and number of objects backed up or restored. For information about the contents of a notification email, see .

    Note: The UNIX mail utility must be running on the Greenplum Database host and must be configured to allow the Greenplum superuser (gpadmin) to send email. Also ensure that the mail program executable is locatable via the gpadmin user’s $PATH.

    Parent topic: Parallel Backup with gpbackup and gprestore

    The gpbackup and gprestore email notification YAML file gp_email_contacts.yaml uses indentation (spaces) to determine the document hierarchy and the relationships of the sections to one another. The use of white space is significant. White space should not be used simply for formatting purposes, and tabs should not be used at all.

    Note: If the status parameters are not specified correctly, the utility does not issue a warning. For example, if the success parameter is misspelled and is set to true, a warning is not issued and an email is not sent to the email address after a successful operation. To ensure email notification is configured correctly, run tests with email notifications configured.

    This is the format of the gp_email_contacts.yaml YAML file for gpbackup email notifications:

    1. contacts:
    2. gpbackup:
    3. - address: user@domain
    4. status:
    5. success: [true | false]
    6. success_with_errors: [true | false]
    7. failure: [true | false]
    8. gprestore:
    9. - address: user@domain
    10. status:
    11. success: [true | false]
    12. success_with_errors: [true | false]
    13. failure: [true | false]

    Email YAML File Sections

    Examples

    This example YAML file specifies sending email to email addresses depending on the success or failure of an operation. For a backup operation, an email is sent to a different address depending on the success or failure of the backup operation. For a restore operation, an email is sent to gpadmin@example.com only when the operation succeeds or completes with errors.

    1. contacts:
    2. gpbackup:
    3. - address: gpadmin@example.com
    4. status:
    5. success:true
    6. - address: my_dba@example.com
    7. status:
    8. success_with_errors: true
    9. failure: true
    10. gprestore:
    11. - address: gpadmin@example.com
    12. status:
    13. success: true
    14. success_with_errors: true

    Understanding Backup Files

    Warning: All gpbackup metadata files are created with read-only permissions. Never delete or modify the metadata files for a gpbackup backup set. Doing so will render the backup files non-functional.

    A complete backup set for gpbackup includes multiple metadata files, supporting files, and CSV data files, each designated with the timestamp at which the backup was created.

    By default, metadata and supporting files are stored on the Greenplum Database master host in the directory $MASTER_DATA_DIRECTORY/backups/YYYYMMDD/YYYYMMDDHHMMSS/. If you specify a custom backup directory, this same file path is created as a subdirectory of the backup directory. The following table describes the names and contents of the metadata and supporting files.

    File nameDescription
    gpbackup<YYYYMMDDHHMMSS>_metadata.sqlContains global and database-specific metadata:

    - DDL for objects that are global to the Greenplum Database cluster, and not owned by a specific database within the cluster.

    - DDL for objects in the backed-up database (specified with —dbname) that must be created before to restoring the actual data, and DDL for objects that must be created after restoring the data.

    Global objects include:

    - Tablespaces

    - Databases

    - Database-wide configuration parameter settings (GUCs)

    - Resource group definitions

    - Resource queue definitions

    - Roles

    - GRANT assignments of roles to databases

    Note: Global metadata is not restored by default. You must include the —with-globals option to the gprestore command to restore global metadata.

    Database-specific objects that must be created before to restoring the actual data include:

    - Session-level configuration parameter settings (GUCs)

    - Schemas

    - Procedural language extensions

    - Types

    - Sequences

    - Functions

    - Tables

    - Protocols

    - Operators and operator classes

    - Conversions

    - Aggregates

    - Casts

    - Views

    - Constraints

    Database-specific objects that must be created after restoring the actual data include:

    - Indexes

    - Rules

    - Triggers. (While Greenplum Database does not support triggers, any trigger definitions that are present are backed up and restored.)
    gpbackup<YYYYMMDDHHMMSS>toc.yamlContains metadata for locating object DDL in the _predata.sql and _postdata.sql files. This file also contains the table names and OIDs used for locating the corresponding table data in CSV data files that are created on each segment. See .
    gpbackup<YYYYMMDDHHMMSS>reportContains information about the backup operation that is used to populate the email notice (if configured) that is sent after the backup completes. This file contains information such as:

    - Command-line options that were provided:

    - Database that was backed up

    - Database version
    - Backup type

    See Configuring Email Notifications.
    gpbackup<YYYYMMDDHHMMSS>_config.yamlContains metadata about the execution of the particular backup task, including:

    - gpbackup version

    - Database name

    - Greenplum Database version

    - Additional option settings such as —no-compression, —compression-level, —metadata-only, —data-only, and —with-stats.
    gpbackup_history.yamlContains information about options that were used when creating a backup with gpbackup, and information about incremental backups.

    Stored on the Greenplum Database master host in the Greenplum Database master data directory.

    This file is not backed up by gpbackup.

    For information about incremental backups, see .

    Segment Data Files

    By default, each segment creates one compressed CSV file for each table that is backed up on the segment. You can optionally specify the --single-data-file option to create a single data file on each segment. The files are stored in <seg_dir>/backups/YYYYMMDD/YYYYMMDDHHMMSS/.

    If you specify a custom backup directory, segment data files are copied to this same file path as a subdirectory of the backup directory. If you include the --leaf-partition-data option, gpbackup creates one data file for each leaf partition of a partitioned table, instead of just one table for file.

    Each data file uses the file name format gpbackup_<content_id>_<YYYYMMDDHHMMSS>_<oid>.gz where:

    • <content_id> is the content ID of the segment.
    • <YYYYMMDDHHMMSS> is the timestamp of the gpbackup operation.

    You can optionally specify the gzip compression level (from 1-9) using the --compression-level option, or deactivate compression entirely with --no-compression. If you do not specify a compression level, gpbackup uses compression level 1 by default.

    Parent topic: