About the Segment Recovery Process

Although gprecoverseg is an online operation, there are two brief periods during which all IO is paused. First, when recovery is initiated, IO is suspended while empty data files are created on the mirror. Second, after data files are synchronized, IO is suspended while system files such as transaction logs are copied from the primary to the mirror. The duration of these pauses is affected primarily by the number of file system files that must be created on the mirror and the sizes of the system flat files that must be copied. The system is online and available while database tables are replicated from the primary to the mirror, and any changes made to the database during recovery are replicated directly to the mirror.

To initiate recovery, the administrator runs the gprecoverseg utility. gprecoverseg prepares segments for recovery and initiates synchronization. When synchronization is complete and the segment status is updated in the system catalog, the segments are recovered. If the recovered segments are not running in their preferred roles, gprecoverseg -r can be used to bring the system back into balance.

Without the -F option, gprecoverseg recovers segments incrementally, copying only the changes since the mirror entered down status. The -F option fully recovers mirrors by deleting their data directories and then synchronizing all persistent data files from the primary to the mirror.

You can run gpstate -e to view the mirroring status of the segments before and during the recovery process. The primary and mirror segment statuses are updated as the recovery process proceeds.

Consider a single primary-mirror segment pair where the primary is active and the mirror is down. The following table shows the segment status before beginning recovery of the mirror.

The segments are in their preferred roles, but the mirror is down. The primary is up and is in change tracking mode because it is unable to send changes to its mirror.

The gprecoverseg utility prepares the segments for recovery and then exits, allowing the Greenplum file replication processes to copy data from the primary to the mirror.

During the gprecoverseg execution the following recovery process steps are completed.

The mirror segment processes are initialized.
For full recovery (-aF):
- The data directories of the down segments are deleted.
- A new data directory structure is created.
The segment mode in the gp_segment_configuration system table is updated to ‘r’ (resynchronization mode).
The gprecoverseg script completes.

Once gprecoverseg has completed, the segments are in the states shown in the following table.

Data file resynchronization is performed in the background by file replication processes. Run gpstate -e to check the process of resynchronization. The Greenplum system is fully available for workloads while this process completes.

Data copy (full and incremental recovery):

After the file system objects are created, data copy is initiated for the affected segments. The ResyncManager process scans the persistent table system catalogs to find the file objects to be synchronized. ResyncWorker processes sync the file objects from the primary to the mirror.
Any changes or new data created with database transactions during the data copy are mirrored directly to the mirror.
Once data copy has finished synchronizing persistent data files, file replication updates the shared memory state on the current primary segment to ‘insync’.

During this phase, system files in the primary segment’s data directory are copied to the segment data directory. IO is suspended while the following flat files are copied from the primary data directory to the segment data directory:

pg_clog/*
pg_distributedlog/*
pg_multixact/members
pg_multixact/offsets
pg_twophase/*
global/pg_database
global/pg_auth
global/pg_auth_time_constraint

IOSUSPEND ends after these files are copied.

The next time the fault tolerance server (ftsprobe) process on the master wakes, it will set the primary and mirror states to synchronized (mode=s, state=u). A distributed query will also trigger the ftsprobe process to update the state.

When all segment recovery and file replication processes are complete, the segment status in the gp_segment_configuration system table and gp_state -e output is as shown in the following table.

The number of database objects, mainly tables and indexes.
The number of data files in segment data directories.
The types of workloads updating data during resynchronization – both DDL and DML (insert, update, delete, and truncate).
The size of the data.

Parent topic: Recovering From Segment Failures