本文档介绍了数据迁移工具 mxshift。
mxshift 工具目前支持以下功能:
WHERE
条件实现数据的增量迁移。此部分主要介绍 mxshift 工具相关参数信息。
使用 mxshift 工具需要先准备一个 TOML 文件。以下是配置文件中的详细参数介绍:
Parameter Name | Description | Required |
---|---|---|
[database.source] Category | ||
--db-host | The host name (Host) or IP address of the Master node in the source cluster. The source database cluster refers to the database cluster from which data needs to be migrated | Yes |
--db-port | Port number of the source database Master | Yes |
--db-database | Source database name | Yes |
--db-user | Source database username | Yes |
--db-password | Source database user password | Yes |
--install-dir | Database installation path (this parameter is supported starting from v4.8.3; for v4.8.2 and earlier versions, use the gphome parameter for configuration) |
Yes |
--hostname-to-ip | Specify the IP address corresponding to the hostname. When the hostnames of the source and target databases are the same, use this configuration to ensure routing accuracy (supported since v4.8.3) | No |
[database.target] category | ||
--db-host | The hostname (Host) or IP address of the Master node in the target cluster. The target cluster refers to the database cluster into which data is being migrated | Yes |
--db-port | The port number of the target database Master | Yes |
--db-database | The name of the target database | Yes |
--db-user | The username of the target database | Yes |
--db-password | The password of the target database user | Yes |
[scope]Category | ||
--mode | Run mode. Allows the following values: normal , dryrun , fetch , motion .normal :Default value, indicating the normal data migration process;dryrun :Verify the connectivity between the two databases, use simple Data Definition Language (DDL) for verification operations, but no data reading or writing;fetch :Based on dryrun , read data from the source and send it to the target data node (Segment), but discard it directly without processing. This option is used to verify the source read efficiency and network transfer efficiency from the source to the target;motion :Based on fetch , parse data on the target database and exchange it to the real target Segment. This option is used to further test the data exchange rate on the target database |
No |
--compress-method | Compression method: gzip / zstd / lz4 / 0 , default value is 0 , which means no compression.Currently, enabling any compression algorithm requires that the mxadmin user on each target Segment can log in to each source Segment without a password. It is recommended to test with dryrun after configuration |
No |
--gphome | The absolute path of GPHOME for the source database (This option has been removed from v4.8.3 onwards) | Yes |
--table-list | List of schemas and table names for source tables. If migrating an entire database, leave this parameter empty and mxshift will migrate all non-zero size tables from largest to smallest | No |
--exclude-table-list | List of schemas and table names to be excluded | No |
--schema-list | Source database schema. Priority is given to table-list if it contains specific tables. If not, check if schema-list is configured. If configured, migrate all tables under the schema. If not, migrate all tables under [database.source] db-database |
No |
--exclude-schema-list | Exclude schemas of the source database. Tables under these schemas will not be migrated. If a schema is configured in both schema-list and exclude-schema-list , it will be excluded |
No |
[log]Category | ||
--log-level | Default value is info , which prints normal progress, error, and warning information. If set to debug , more debugging information will be output, only used for development and troubleshooting |
No |
--no-color | Default is false . Set to true to disable color tags |
No |
[controller]Category | ||
--both-way | Default is false , which starts with the largest table for fastest overall migration. If set to true , migration proceeds from both largest and smallest tables simultaneously |
No |
--concurrency | Number of tables migrated in parallel, default is 5, maximum 10 | No |
[transfer]Category | ||
--verify | Check if row counts match before and after data migration, default is false |
No |
--with-index | Whether to migrate indexes, default is false . If disabled, mxshift saves index SQL statements to a file for manual creation. Note: For tables using MARS2 storage engine, indexes must be created for the table to take effect, so DDL migration for MARS2 tables automatically includes indexes |
No |
[ddl]Category (Supported from 4.8.2 onwards) | ||
--enabled | Whether to migrate table DDL, default is false . Before migrating data, mxshift prioritizes migrating DDL by database.Supported database versions for DDL migration:Greenplum6 -> YMatrix4 / YMatrix5;YMatrix4 -> YMatrix4 / YMatrix5;YMatrix5 -> YMatrix5 |
No |
--file-path | File path for importing/exporting DDL SQL. Automatically created if non-existent. Exported files can be viewed or modified, but do not alter or delete comments starting with -- DO NOT EDIT. , as this may affect import results or cause errors |
No |
--mode | DDL processing mode. Options are exec , output , and input . exec is default, executing directly on the target database;output outputs to file and exits without data migration;input reads from a file. For input mode, --file-path must point to a file exported by mxshift |
No |
--only-ddl | Whether to migrate only DDL. Default is false . If false , data migration starts after DDL migration. If true , the program exits after DDL migration without moving data |
No |
--skip-resource-queue-and-group | Whether to skip resource queue/group migration. Set to true to exclude them during DDL migration |
No |
--skip-table-space | Whether to skip tablespace migration. Set to true to exclude tablespace during DDL migration |
No |
If you want to implement incremental data migration through WHERE conditions, you may use the following parameters:
Category | Subcategory | Subparameters | Sub-subcategory | Sub-Subparameters | Description | Required |
[transfer]Category | [transfer.table-data-where-sql]Category | --enabled | If `true`, the source database data is filtered using the WHERE condition. Only data that meets the WHERE condition is migrated to achieve incremental migration. This function is turned off by default. Manual identification is required when enabled. `true` | No | ||
--global | This parameter refers to the `WHERE` expression of SQL. mxshift will append this conditional expression to all tables that are migrated. If the `WHERE` keyword is not included in the table, it is prompted that `SKIPPED` will automatically skip the table | No | ||||
[[transfer.table-data-where-sql.override]] Category | --where | represents a WHERE subexpression with a higher priority than the WHERE expression in global and apply only to a table level that filters a specific table. | No | |||
[transfer.table-data-where-sql.override.table]Category | --schema | Schema name to be migrated | No | |||
--name | Table name to be migrated | No |
See the configuration file template below for examples of the above parameters usage.
The mxshift command line parameters are as follows:
Parameter name | Description |
---|---|
--config | Print the default content of the configuration file |
-a or --all | Perform migration directly without confirmation. Completed tables and empty tables are skipped by default |
-c or --config-path | mxshift tool global configuration file path |
-v or --version | mxshift tool version |
-l or --show-progress | Print the progress of the last migration. If it is not currently migrating, the results of the last migration will be displayed; if it is currently migrating, the progress of the current migration will be displayed. Among them, the value of "STATUS" and its meaning are: Not start: not within the migration range; Skip: skip (migrated or empty table); Processing/Interrupted: is being migrated or interrupted during the migration process; Complete: complete migration; Failed: migration failed |
-R or --redo | If the table has been migrated, do not skip, remigrate (this option will clear all data from the table in the target library and remigrate) |
The complete template file is named config.toml
. This template can also be obtained by executing mxshift --config
. You can execute the vim config.toml
command to start editing the file, then directly copy all the contents into the file, and then modify the response configuration items. Or use the sample template we gave below for different usage scenarios. Just choose what you need.
[database]
[database.source]
## Name of database
db-database= "testdb"
## Hostname of database master
db-host="sdw3"
## password of database
db-password="xxxx"
## Port of database master
db-port=54322
## user name of database
db-user="mxadmin"
## Version of database(Please use the result of 'SELECT version();' as value). Required only when
## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul 5 2022 15:45:24"
## The installation directory of matrixdb(this option is supported starting from version 4.8.3, 4.8.2 and earlier versions need to be specified by gphone)
install-dir="/usr/local/greenplum-db-6.7.1"
[[database.source.hostname-to-ip]]
## The content within <> should be replaced with actual information and <> should be removed(this option is supported starting from version 4.8.3)
node-hostname="<mdw>"
node-ip="<127.0.0.1>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw1>"
node-ip="<127.0.0.2>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw2>"
node-ip="<127.0.0.3>"
[database.target]
## Name of database
db-database="destdb"
## Hostname of database master
db-host="172.16.100.32"
## password of database
db-password="yyyy"
## Port of database master
db-port=5432
## user name of database
db-user="mxadmin"
## Version of database(Please use the result of 'SELECT version();' as value). Required only when
## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul 5 2022 15:45:24"
[scope]
compress_method="lz4"
## (this option is abandoned from version 4.8.3)
# gphome="/usr/local/greenplum-db-6.7.1"
mode="normal"
[[scope.table-list]]
schema="test_schema_1"
name="table_001"
[[scope.table-list]]
schema="test_schema_2"
name="table_002"
[[scope.exclude-table-list]]
schema="test_schema_3"
name="table_003"
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]
[log]
log-level="info"
## Print log without color.
# no-color=false
[controller]
both-way=true
concurrency=5
[transfer]
verify=true
# with-index=true
[transfer.table-data-where-sql]
enabled=false
global="txdate >= '2022-10-01' AND batchnum >= 100000000"
[[transfer.table-data-where-sql.override]]
where="abc > 10"
[transfer.table-data-where-sql.override.table]
schema="test_schema_1"
name="table_001"
[[transfer.table-data-where-sql.override]]
where="tag != 'aabbcc' AND ts > '2022-01-01'"
[transfer.table-data-where-sql.override.table]
schema="test_schema_2"
name="another_table"
[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
skip-table-space=true
[[ddl.replace]]
## Only applicable for the case of migration from Greenplum to YMatrix
category="role"
[[ddl.replace.pairs]]
old="gpadmin"
new="mxadmin"
After you have prepared the TOML file, execute the following command to start mxshift for migration.
$ mxshift -c config_path.toml
[database]
[database.source]
## Name of database
db-database= "testdb"
## Hostname of database master
db-host="sdw3"
## password of database
db-password="xxxx"
## Port of database master
db-port=54322
## user name of database
db-user="gpadmin"
## Version of database(Please use the result of 'SELECT version();' as value). Required only when
## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul 5 2022 15:45:24"
## The installation directory of matrixdb(this option is supported starting from version 4.8.3, 4.8.2 and earlier versions need to be specified by gphone)
install-dir="/usr/local/greenplum-db-6.7.1"
[[database.source.hostname-to-ip]]
## The content within <> should be replaced with actual information and <> should be removed(this option is supported starting from version 4.8.3)
node-hostname="<mdw>"
node-ip="<127.0.0.1>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw1>"
node-ip="<127.0.0.2>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw2>"
node-ip="<127.0.0.3>"
[database.target]
## Name of database
db-database="destdb"
## Hostname of database master
db-host="172.16.100.32"
## password of database
db-password="yyyy"
## Port of database master
db-port=5432
## user name of database
db-user="mxadmin"
## Version of database(Please use the result of 'SELECT version();' as value). Required only when
## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul 5 2022 15:45:24"
[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd
compress-method="lz4"
## (this option is abandoned from version 4.8.3)
# gphome="/usr/local/greenplum-db-6.7.1"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
[[scope.table-list]]
schema="test_schema_1"
name="table_001"
[[scope.table-list]]
schema="test_schema_2"
name="table_002"
[[scope.exclude-table-list]]
schema="test_schema_3"
name="table_003"
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]
[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false
[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=5
[transfer]
## Verity the number of record of every table
verify=true
with-index=true
[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
# skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
# skip-table-space=true
[[ddl.replace]]
## Only applicable for the case of migration from Greenplum to YMatrix
category="role"
[[ddl.replace.pairs]]
old="gpadmin"
new="mxadmin"
After preparing the TOML file, execute the following command to start mxshift for full migration.
$ mxshift -c config_path_full.toml
Implement incremental data migration through WHERE conditions:
[database]
[database.source]
## Name of database
db-database= "testdb"
## Hostname of database master
db-host="sdw3"
## password of database
db-password="xxxx"
## Port of database master
db-port=54322
## user name of database
db-user="gpadmin"
[database.target]
## Name of database
db-database="destdb"
## Hostname of database master
db-host="172.16.100.32"
## password of database
db-password="yyyy"
## Port of database master
db-port=5432
## user name of database
db-user="mxadmin"
[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd
compress-method="lz4"
## The installation directory of matrixdb
gphome="/usr/local/greenplum-db-6.7.1"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
[[scope.table-list]]
schema=......
name=......
[[scope.table-list]]
......
[[scope.exclude-table-list]]
......
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]
[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false
[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=5
[transfer]
## Verity the number of record of every table
verify=true
[transfer.table-data-where-sql]
enabled=true
global="txdate >= '2022-10-01' AND batchnum >= 100000000"
[[transfer.table-data-where-sql.override]]
where="abc > 10"
[transfer.table-data-where-sql.override.table]
schema="test_schema_1"
name="table_001"
[[transfer.table-data-where-sql.override]]
where="tag != 'aabbcc' AND ts > '2022-01-01'"
[transfer.table-data-where-sql.override.table]
schema="test_schema_2"
name="another_table"
After you have prepared the TOML file, execute the following command to start mxshift for incremental migration.
$ mxshift -c config_path_incremental.toml
If you are migrating a database or all replication tables under a schema, you do not need to make additional configurations in config_path.toml
. mxshfit will automatically determine which tables are replication tables and execute corresponding migration strategies to prevent repeated migration of data. You only need to configure schema
and name
in scope.table-list
.
[[scope.table-list]]
schema=“public1”
name=“table_replicated1”
[[scope.table-list]]
schema=“public2”
name=“table_replicated2”
After you have prepared the TOML file, execute the following command to start mxshift for copy table migration.
$ mxshift -c config_path_replicated.toml
Notes! For an introduction to the data migration ideas and complete steps, please see [Data Migration] (/doc/4.8/maintain/migrate/overview)