mxshift

This document introduces the data migration tool mxshift.

1 Description

The mxshift tool currently supports the following features:

  • Full migration: In terms of databases, implement parallel migration of data table data definition language (DDL) and data from Greenplum 5 / Greenplum 6 / YMatrix clusters to the new YMatrix cluster. Currently, only the full migration mode supports migration data table DDL.
  • Reverse migration: Reverse migration from YMatrix to Greenplum 5 / Greenplum 6.
  • Incremental migration: Incremental migration of data is achieved through the WHERE condition.
  • Supported data distribution strategies: hash distribution, random distribution, replication table (REPLICATED), Master-only.

2 Parameter information

This section mainly introduces the parameter information related to the mxshift tool.

2.1 Configuration File

Using the mxshift tool requires you to prepare a TOML file first. The following is a detailed parameter description in the configuration file:

| Parameter name | Description | Required | | --- | --- | |[database.source]Category| | |--db-host|Source cluster Master node hostname (Host) or IP address. Source database cluster refers to the database cluster that needs to be migrated out|Yes |--db-port|Source Database Master's port number|Yes |--db-database|source database name|Yes |--db-user|source database username|Yes |--db-password|source database user password|Yes |--install-dir|Database installation path (v5.0.1 and above are available, v5.0.0 also requires gphome configuration)|Yes |--hostname-to-ip|Specify the IP address corresponding to the host name. Use this configuration to ensure routing accuracy when the source and destination libraries host names are the same (supported as v5.0.1) | No |[database.target]Category| | |--db-host|Target cluster Master node hostname (Host) or IP address. The target cluster refers to the database cluster in which data is migrated|Yes |--db-port|Target database Master's port number|Yes |--db-database|target database name|Yes |--db-user|Target database username|Yes |--db-password|Target database user password|Yes |[scope]Category| | |--mode|Running mode. The following values ​​are allowed: normal, dryrun, fetch, motion.
normal: Default value, refers to the normal data migration process;
dryrun: Verify the connectivity of both databases, use a simple database definition language (DDL) for verification operations, but does not read and write data;
fetch: On the basis of dryrun, the data on the source side is read and sent to the target data node (Segment), but it is discarded directly and does not process it. This option is used to verify the source side reading efficiency and the network transmission efficiency of the source side to the target side;
motion: On the basis of fetch, the data is parsed on the target side database and exchanged to the real target Segment. This option is used to further test the data exchange rate of the target side database|No |--compress-method|Compression method: gzip / zstd / lz4 / 0, the default value is 0, that is, it is not compressed.
At present, any compression algorithm is enabled, the mxadmin user of each Segment on the target side needs to log in to each Segment on the source side without password ssh. It is recommended to use dryrun to test it after configuration|No |--gphome|GPHOME absolute path to the source database (v5.0.0 is available, v5.0.1 and above must use the install-dir configuration)|Yes |--table-list|Schema (Schema) and name list of source data table. If it is a partition table, you need to fill in the partition table. If you need to migrate the entire database, the parameter is configured as empty. mxshift will migrate all tables of size not 0 from large to small | No |--exclude-table-list|Schema and name list of tables to be excluded|No |--schema-list|Schema of the source database. The priority is that if table-list has a specific table configured, only the tables in table-list are transferred. If not, check whether schema-list is configured. If configured, all tables under the Schema. If not, transmit all tables under [database.source]``db-database|No |--exclude-schema-list|Exclude Schema of the source database and do not migrate data to the tables under this Schema. Ifschema-listandexclude-schema-listare configured, then the Schema will be excluded|No |**[log]Category**| | |--log-level|The default value isinfo, which will print normal progress information, error and warning information. If changed todebug, more debug information will be output, only for development and fault location | No |--no-color|Default isfalse.truewill close the color tag|No |**[controller]Category**| | |--both-way|Default isfalse, and the largest table will be migrated first to achieve the shortest time to migrate the complete data. Ifboth-wayis set totrue, it will be migrated from both the largest and smallest ends at the same time | No |--concurrency|The number of tables that support parallel migration, default is 5, no more than 10|No |**[transfer]Category**| | |--verify|Check whether the number of rows before and after data migration matches, the default isfalse|No |--with-index|Do not migrate the index? The default isfalse. Choose not to perform automatic migration. At this time, mxshift will save the SQL statements for migrating the index in a file under the current mxshift execution path, so that the index can be created manually later. Note: If you are using the MARS2 storage engine to build tables, you must create an index table to take effect. Therefore, while migrating DDL in MARS2 table, the index will be automatically migrated, and will not be affected by this option|No |**[ddl]Category**| | |--enabled|Whether to migrate data table DDL, default isfalse. After starting, mxshift will prioritize its DDL migration by database before migrating data from a data table. </br>DDL migration supports database version: </br>Greenplum6 -> YMatrix4 / YMatrix5; </br>YMatrix4 -> YMatrix4 / YMatrix5; </br>YMatrix5 -> YMatrix5|No |--only-ddl|Whether only DDL is migrated. The default isfalse. When configured asfalse, the data of the relevant table will be automatically migrated after the DDL migration is completed; when configured astrue, the program will be exited after the DDL migration is completed, and the table data will not be migrated|No |--skip-resource-queue-and-group|Configure whether to migrate resource queues (groups). Whenskip-resource-queue-and-group=true, the resource group and resource queue will be skipped during the DDL migration|No |--skip-table-space|Configure whether to migrate tablespaces. Whenskip-table-space=true`, tablespace will be skipped during DDL migration|No

If you want to implement incremental data migration through WHERE conditions, you may use the following parameters:

Category Subcategory Subparameters Sub-subcategory Sub-Subparameters Description Required
[transfer]Category [transfer.table-data-where-sql]Category --enabled If `true`, the source database data is filtered using the WHERE condition. Only data that meets the WHERE condition is migrated to achieve incremental migration. This function is turned off by default. Manual identification is required when enabled. `true` No
--global This parameter refers to the `WHERE` expression of SQL. mxshift will append this conditional expression to all tables that are migrated. If the `WHERE` keyword is not included in the table, it is prompted that `SKIPPED` will automatically skip the table No
[[transfer.table-data-where-sql.override]] Category --where represents a WHERE subexpression with a higher priority than the WHERE expression in global and applies only to a table level that filters a specific table. No
[transfer.table-data-where-sql.override.table]Category --schema Schema name to be migrated No
--name Table name to be migrated No

See the configuration file template below for examples of the above parameters usage.

2.2 Command line parameters

The mxshift command line parameters are as follows:

Parameter name Description
--config Print the default content of the configuration file
-a or --all Perform migration directly without confirmation. Completed tables and empty tables are skipped by default
-c or --config-path mxshift tool global configuration file path
-v or --version mxshift tool version
-l or --show-progress Print the progress of the last migration. If it is not currently migrating, the results of the last migration will be displayed; if it is currently migrating, the progress of the current migration will be displayed. Among them, the value of "STATUS" and its meaning are: Not start: not within the migration range; Skip: skip (migrated or empty table); Processing/Interrupted: is being migrated or interrupted during the migration process; Complete: complete migration; Failed: migration failed
-R or --redo If the table has been migrated, do not skip, remigrate (this option will clear all data from the table in the target library and remigrate)

3 Examples

The complete template file is named config.toml. This template can also be obtained by executing mxshift --config. You can execute the vim config.toml command to start editing the file, then directly copy all the contents into the file, and then modify the response configuration items. Or use the sample template we gave below for different usage scenarios. Just choose what you need.

[database]
    [database.source]
        ## Name of database
        db-database= "testdb"
        ## Hostname of database master
        db-host="sdw3"
        ## password of database
        db-password="xxxx"
        ## Port of database master
        db-port=54322
        ## user name of database
        db-user="mxadmin"
        ## The installation directory of database (v5.0.1 and above)
        # install-dir="/usr/local/greenplum-db-6.7.1"

    [database.target]
        ## Name of database
        db-database="destdb"
        ## Hostname of database master
        db-host="172.16.100.32"
        ## password of database
        db-password="yyyy"
        ## Port of database master
        db-port=5432
        ## user name of database
        db-user="mxadmin"
[scope]
compress_method="lz4"
gphome
mode="normal"
        [[scope.table-list]]
                schema="test_schema_1"
                name="table_001"
        [[scope.table-list]]
                schema="test_schema_2"
                name="table_002"  
        [[scope.exclude-table-list]]
                schema="test_schema_3"
                name="table_003"
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]

[log]
log-level="info"
## Print log without color.
# no-color=false

[controller]
both-way=true
concurrency=5

[transfer]
verify=true
# with-index=true
[transfer.table-data-where-sql]
enabled=false
global="txdate >= '2022-10-01' AND batchnum >= 100000000"
[[transfer.table-data-where-sql.override]]
        where="abc > 10"
        [transfer.table-data-where-sql.override.table]
        schema="test_schema_1"
        name="table_001"
[[transfer.table-data-where-sql.override]]
        where="tag != 'aabbcc' AND ts > '2022-01-01'"
        [transfer.table-data-where-sql.override.table]
        schema="test_schema_2"
        name="another_table"

[ddl]
enabled=true
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
skip-table-space=true
        [[ddl.replace]]
        ## Only applicable for the case of migration from Greenplum to YMatrix
                category="role"
                [[ddl.replace.pairs]]
                        old="gpadmin"
                        new="mxadmin"      

After you have prepared the TOML file, execute the following command to start mxshift for migration.

$ mxshift -c config_path.toml

3.1 Full migration

[database]
        [database.source]
        ## Name of database
        db-database= "testdb"
        ## Hostname of database master
        db-host="sdw3"
        ## password of database
        db-password="xxxx"
        ## Port of database master
        db-port=54322
        ## user name of database
        db-user="gpadmin"
        ## The installation directory of database (v5.0.1 and above)
        # install-dir="/usr/local/greenplum-db-6.7.1"

        [database.target]
        ## Name of database
        db-database="destdb"
        ## Hostname of database master
        db-host="172.16.100.32"
        ## password of database
        db-password="yyyy"
        ## Port of database master
        db-port=5432
        ## user name of database
        db-user="mxadmin"

[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd
compress-method="lz4"
## The installation directory of database (v5.0.0)
# gphome="/usr/local/greenplum-db-6.7.1"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
        [[scope.table-list]]
                schema="test_schema_1"
                name="table_001"
        [[scope.table-list]]
                schema="test_schema_2"
                name="table_002"
        [[scope.exclude-table-list]]
                schema="test_schema_3"
                name="table_003"       
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]

[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false

[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=5

[transfer]
## Verity the number of record of every table
verify=true
with-index=true

[ddl]
enabled=true
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
# skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
# skip-table-space=true
        [[ddl.replace]]
        ## Only applicable for the case of migration from Greenplum to YMatrix
                category="role"
                [[ddl.replace.pairs]]
                        old="gpadmin"
                        new="mxadmin"

After preparing the TOML file, execute the following command to start mxshift for full migration.

$ mxshift -c config_path_full.toml

3.2 Incremental Migration

Implement incremental data migration through WHERE conditions:

[database]
        [database.source]
        ## Name of database
        db-database= "testdb"
        ## Hostname of database master
        db-host="sdw3"
        ## password of database
        db-password="xxxx"
        ## Port of database master
        db-port=54322
        ## user name of database
        db-user="gpadmin"
        ## The installation directory of database (v5.0.1 and above)
        # install-dir="/usr/local/greenplum-db-6.7.1"

        [database.target]
        ## Name of database
        db-database="destdb"
        ## Hostname of database master
        db-host="172.16.100.32"
        ## password of database
        db-password="yyyy"
        ## Port of database master
        db-port=5432
        ## user name of database
        db-user="mxadmin"

[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd
compress-method="lz4"
## The installation directory of database (v5.0.0)
# gphome="/usr/local/greenplum-db-6.7.1"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
        [[scope.table-list]]
                schema=......
                name=......
        [[scope.table-list]]
        ......     
        [[scope.exclude-table-list]]
        ......        
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]

[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false

[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=5

[transfer]
## Verity the number of record of every table
verify=true
[transfer.table-data-where-sql]
enabled=true
global="txdate >= '2022-10-01' AND batchnum >= 100000000"
[[transfer.table-data-where-sql.override]]
        where="abc > 10"
        [transfer.table-data-where-sql.override.table]
        schema="test_schema_1"
        name="table_001"
[[transfer.table-data-where-sql.override]]
        where="tag != 'aabbcc' AND ts > '2022-01-01'"
        [transfer.table-data-where-sql.override.table]
        schema="test_schema_2"
        name="another_table"

After you have prepared the TOML file, execute the following command to start mxshift for incremental migration.

$ mxshift -c config_path_incremental.toml

3.3 Replicated Table

If you are migrating a database or all replication tables under a schema, you do not need to make additional configurations in config_path.toml. mxshfit will automatically determine which tables are replication tables and execute corresponding migration strategies to prevent repeated migration of data. You only need to configure schema and name in scope.table-list.

   [[scope.table-list]]
                schema=“public1”
                name=“table_replicated1”
   [[scope.table-list]]
                schema=“public2”
                name=“table_replicated2”

After you have prepared the TOML file, execute the following command to start mxshift for copy table migration.

$ mxshift -c config_path_replicated.toml

Notes!
For an introduction to the data migration ideas and complete steps, please refer to [Data Migration] (/doc/5.0/maintain/migrate/guidebook).