This document introduces the data migration tool mxshift.
The mxshift tool currently supports the following features:
WHERE
conditions.This section mainly introduces the parameter information related to the mxshift tool.
To use the mxshift tool, you must first prepare a TOML file. The following is a detailed description of the parameters in the configuration file:
Parameter Name | Description | Required |
---|---|---|
[database.source] category | ||
--db-host | Source cluster Master node host name (Host) or IP address. The source database cluster refers to the database cluster from which data needs to be migrated | Yes |
--db-port | Port number of the source database Master | Yes |
--db-database | Source database name | Yes |
--db-user | Source database username | Yes |
--db-password | Source database user password | Yes |
--install-dir | Database installation path | Yes |
--hostname-to-ip | Specify the IP address corresponding to the hostname. When the hostnames of the source and target databases are the same, use this configuration to ensure routing accuracy | No |
[database.target] category | ||
--db-host | The hostname (Host) or IP address of the Master node in the target cluster. The target cluster refers to the database cluster into which data is migrated | Yes |
--db-port | The port number of the target database Master | Yes |
--db-database | The name of the target database | Yes |
--db-user | The username of the target database | Yes |
--db-password | The password of the target database user | Yes |
[scope]Category | ||
--mode | Running mode. The following values are allowed: normal , dryrun , fetch , motion . normal : Default value, refers to the normal data migration process; dryrun : Verify the connectivity of both databases, use a simple database definition language (DDL) for verification operations, but does not read and write data; fetch : On the basis of dryrun , the data on the source side is read and sent to the target data node (Segment), but it is discarded directly and does not process it. This option is used to verify the source side reading efficiency and the network transmission efficiency of the source side to the target side; motion : On the basis of fetch , the data is parsed on the target side database and exchanged to the real target Segment. This option is used to further test the data exchange rate of the target side database |
No |
--compress-method | Compression method: gzip / zstd / lz4 / 0 , the default value is 0 , that is, it is not compressed. At present, any compression algorithm is enabled, the mxadmin user of each Segment on the target side needs to log in to each Segment on the source side without password ssh. It is recommended to use dryrun to test it after configuration |
No |
--gphome | GPHOME absolute path to the source database (v5.0.0 is available, v5.0.1 and above must use the install-dir configuration) |
Yes |
--table-list | Schema (Schema) and name list of source data table. If it is a partition table, you need to fill in the partition table. If you need to migrate the entire database, the parameter is configured as empty. mxshift will migrate all tables of size not 0 from large to small | No |
--exclude-table-list | Schema and name list of tables to be excluded | No |
--schema-list | Schema of the source database. The priority is that if table-list has a specific table configured, only the tables in table-list are transferred. If not, check whether schema-list is configured. If configured, all tables under the Schema. If not, transmit all tables under [database.source]`` db-database` |
No |
--exclude-schema-list | Exclude Schema of the source database and do not migrate data to the tables under this Schema. If schema-list and exclude-schema-list are configured, then the Schema will be excluded |
No |
[log]Category | ||
--log-level | The default value is info , which will print normal progress information, error and warning information. If changed to debug , more debug information will be output, only for development and fault location |
No |
--no-color | Default is false . true will close the color tag |
No |
[controller]Category | ||
--both-way | Default is false , and the largest table will be migrated first to achieve the shortest time to migrate the complete data. If both-way is set to true , it will be migrated from both the largest and smallest ends at the same time |
No |
--concurrency | The number of tables that support parallel migration, default is 5, no more than 10 | No |
[transfer]Category | ||
--verify | Check whether the number of rows before and after data migration matches, the default is false |
No |
--with-index | Do not migrate the index? The default is false . Choose not to perform automatic migration. At this time, mxshift will save the SQL statements for migrating the index in a file under the current mxshift execution path, so that the index can be created manually later. Note: If you are using the MARS2 storage engine to build tables, you must create an index table to take effect. Therefore, while migrating DDL in MARS2 table, the index will be automatically migrated, and will not be affected by this option |
No |
[ddl]Category | ||
--enabled | Whether to migrate data table DDL, default is false . After starting, mxshift will prioritize its DDL migration by database before migrating data from a data table. DDL migration supports database version: Greenplum6 -> YMatrix4 / YMatrix5; YMatrix4 -> YMatrix4 / YMatrix5; YMatrix5 -> YMatrix5 |
No |
--only-ddl | Whether only DDL is migrated. The default is false . When configured as false , the data of the relevant table will be automatically migrated after the DDL migration is completed; when configured as true , the program will be exited after the DDL migration is completed, and the table data will not be migrated |
No |
--skip-resource-queue-and-group | Configure whether to migrate resource queues (groups). When skip-resource-queue-and-group=true , the resource group and resource queue will be skipped during the DDL migration |
No |
--skip-table-space | Configure whether to migrate tablespaces. When skip-table-space=true , tablespace will be skipped during DDL migration |
No |
If you want to implement conditional migration through WHERE conditions, you may use the following parameters:
Category | Subcategory | Subparameters | Sub-subcategory | Sub-Subparameters | Description | Required |
[transfer]Category | [transfer.table-data-where-sql]Category | --enabled | If `true`, the source database data is filtered using the WHERE condition. Only data that meets the WHERE condition is migrated to achieve incremental migration. This function is turned off by default. Manual identification is required when enabled. `true` | No | ||
--global | This parameter refers to the `WHERE` expression of SQL. mxshift will append this conditional expression to all tables that are migrated. If the `WHERE` keyword is not included in the table, it is prompted that `SKIPPED` will automatically skip the table | No | ||||
[[transfer.table-data-where-sql.override]] Category | --where | represents a WHERE subexpression with a higher priority than the WHERE expression in global and apply only to a table level that filters a specific table. | No | |||
[transfer.table-data-where-sql.override.table]Category | --schema | Schema name to be migrated | No | |||
--name | Table name to be migrated | No |
See the configuration file template below for examples of the above parameters usage.
The mxshift command line parameters are as follows:
Parameter name | Description |
---|---|
--config | Print the default content of the configuration file |
-a or --all | Perform migration directly without confirmation. Completed tables and empty tables are skipped by default |
-c or --config-path | mxshift tool global configuration file path |
-v or --version | mxshift tool version |
-l or --show-progress | Print the progress of the last migration. If it is not currently migrating, the results of the last migration will be displayed; if it is currently migrating, the progress of the current migration will be displayed. Among them, the value of "STATUS" and its meaning are: Not start: not within the migration range; Skip: skip (migrated or empty table); Processing/Interrupted: is being migrated or interrupted during the migration process; Complete: complete migration; Failed: migration failed |
-R or --redo | If the table has been migrated, do not skip, remigrate (this option will clear all data from the table in the target library and remigrate) |
The complete template file is named config.toml
. This template is available by executing mxshift --config
.
We give an example template for different usage scenarios below. Please use this as a reference and use the vim config.toml
command to modify the file to complete the configuration according to the specific situation.
[database]
[database.source]
## Name of database
db-database= "testdb"
## Hostname of database master
db-host="sdw3"
## password of database
db-password="xxxx"
## Port of database master
db-port=54322
## user name of database
db-user="gpadmin"
## Version of database(Please use the result of 'SELECT version();' as value). Required only when
## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 5.1.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun 5 2023 15:45:24"
## The installation directory of database
install-dir="/usr/local/greenplum-db-6.7.1"
[[database.source.hostname-to-ip]]
## The content within <> should be replaced with actual information and <> should be removed
node-hostname="<mdw>"
node-ip="<127.0.0.1>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw1>"
node-ip="<127.0.0.2>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw2>"
node-ip="<127.0.0.3>"
[database.target]
## Name of database
db-database="destdb"
## Hostname of database master
db-host="172.16.100.32"
## password of database
db-password="yyyy"
## Port of database master
db-port=5432
## user name of database
db-user="mxadmin"
## Version of database(Please use the result of 'SELECT version();' as value). Required only when
## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 5.1.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun 5 2023 15:45:24"
[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd
compress-method="lz4"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
[[scope.table-list]]
schema="test_schema_1"
name="table_001"
[[scope.table-list]]
schema="test_schema_2"
name="table_002"
[[scope.exclude-table-list]]
schema="test_schema_3"
name="table_003"
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]
## Whether to disable data incremental migration, by default, it is true.
# disable-data-increment=true
[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false
[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=3
[transfer]
## Verity the number of record of every table
verify=true
with-index=true
[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
# skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
# skip-table-space=true
[[ddl.replace]]
## Only applicable for the case of migration from Greenplum to YMatrix
category="role"
[[ddl.replace.pairs]]
old="gpadmin"
new="mxadmin"
## Whether to disable ddl incremental migration, by default, it is true.
# disable-ddl-increment=true
After preparing the TOML file, execute the following command to start mxshift for full migration.
$ mxshift -c config_path_full.toml
[database]
[database.source]
## Name of database
db-database= "testdb"
## Hostname of database master
db-host="sdw3"
## password of database
db-password="xxxx"
## Port of database master
db-port=54322
## user name of database
db-user="gpadmin"
## Version of database(Please use the result of 'SELECT version();' as value). Required only when
## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 5.1.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun 5 2023 15:45:24"
## The installation directory of database
install-dir="/usr/local/greenplum-db-6.7.1"
[[database.source.hostname-to-ip]]
## The content within <> should be replaced with actual information and <> should be removed
node-hostname="<mdw>"
node-ip="<127.0.0.1>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw1>"
node-ip="<127.0.0.2>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw2>"
node-ip="<127.0.0.3>"
[database.target]
## Name of database
db-database="destdb"
## Hostname of database master
db-host="172.16.100.32"
## password of database
db-password="yyyy"
## Port of database master
db-port=5432
## user name of database
db-user="mxadmin"
## Version of database(Please use the result of 'SELECT version();' as value). Required only when
## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 5.1.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun 5 2023 15:45:24"
[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd
compress-method="lz4"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
[[scope.table-list]]
schema="test_schema_1"
name="table_001"
[[scope.table-list]]
schema="test_schema_2"
name="table_002"
[[scope.exclude-table-list]]
schema="test_schema_3"
name="table_003"
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]
## Whether to disable data incremental migration, by default, it is true.
disable-data-increment=false
[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false
[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=3
[transfer]
## Verity the number of record of every table
verify=true
with-index=true
[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
# skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
# skip-table-space=true
[[ddl.replace]]
## Only applicable for the case of migration from Greenplum to YMatrix
category="role"
[[ddl.replace.pairs]]
old="gpadmin"
new="mxadmin"
## Whether to disable ddl incremental migration, by default, it is true.
disable-ddl-increment=false
After you have prepared the TOML file, execute the following command to start mxshift for incremental migration.
$ mxshift -c config_path_incremental.toml
Through the WHERE
condition, implement data migration that meets certain conditions or implement expired data not to be migrated:
[database]
[database.source]
## Name of database
db-database= "testdb"
## Hostname of database master
db-host="sdw3"
## password of database
db-password="xxxx"
## Port of database master
db-port=54322
## user name of database
db-user="gpadmin"
## The installation directory of database
install-dir="/usr/local/greenplum-db-6.7.1"
[database.target]
## Name of database
db-database="destdb"
## Hostname of database master
db-host="172.16.100.32"
## password of database
db-password="yyyy"
## Port of database master
db-port=5432
## user name of database
db-user="mxadmin"
[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd
compress-method="lz4"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
[[scope.table-list]]
schema=......
name=......
[[scope.table-list]]
......
[[scope.exclude-table-list]]
......
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]
[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false
[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=3
[transfer]
## Verity the number of record of every table
verify=true
[transfer.table-data-where-sql]
enabled=true
global="txdate >= '2022-10-01' AND batchnum >= 100000000"
[[transfer.table-data-where-sql.override]]
where="abc > 10"
[transfer.table-data-where-sql.override.table]
schema="test_schema_1"
name="table_001"
[[transfer.table-data-where-sql.override]]
where="tag != 'aabbcc' AND ts > '2022-01-01'"
[transfer.table-data-where-sql.override.table]
schema="test_schema_2"
name="another_table"
After you have prepared the TOML file, execute the following command to start mxshift for conditional migration.
$ mxshift -c config_path_conditional.toml
If you are migrating a database or all replication tables under a schema, you do not need to make additional configurations in config_path.toml
. mxshfit will automatically determine which tables are replication tables and execute corresponding migration strategies to prevent repeated migration of data. You only need to configure schema
and name
in scope.table-list
.
[[scope.table-list]]
schema=“public1”
name=“table_replicated1”
[[scope.table-list]]
schema=“public2”
name=“table_replicated2”
After you have prepared the TOML file, execute the following command to start mxshift for copy table migration.
$ mxshift -c config_path_replicated.toml
Notes!
For an introduction to the data migration ideas and complete steps, please refer to [Data Migration] (/doc/5.1/maintain/migrate/guidebook)