Data Migration
This implementation has the following advantages:
- No impact on the original data during migration.
- No risk in case of migration failure.
- Freedom from sharding strategy limitations.
The implementation has the following disadvantages:
- Redundant servers can exist for a certain period of time.
A single data migration mainly consists of the following phases:
- Preparation.
- Stock data migration.
- The synchronization of incremental data.
- Traffic switching .
In the preparation stage, the data migration module verifies data source connectivity and permissions, counts stock data statistics, records the log and finally shards the tasks according to data volume and parallelism set by the users.
Execute the stock data migration tasks that have been sharded during preparation stage. The stock migration stage uses JDBC queries to read data directly from the source and write into the target based on the sharding rules and other configurations.
Since the duration of stock data migration depends on factors such as data volume and parallelism, it is necessary to synchronize the data added to the business operations during this period. Different databases differ in technical details, but in general they are all based on replication protocols or WAL logs to achieve the capture of changed data.
- MySQL: subscribe and parse binlog.
During this stage, there may be a read-only period of time, where data in the source data nodes is allowed to be in static mode for a short period of time to ensure that the incremental synchronization can be fully completed. Users can set this by shifting the database to read-only status or by controlling the traffic flow generated from the source.
The length of this read-only window depends on whether users need to perform consistency checks on the data and the exact amount of data in this scenario. Consistency check is an independent task. It supports separate start/stop and breakpoint resume.
Once confirmed, the data migration is complete. Users can then switch the read traffic or write traffic to Apache ShardingSphere.