mxshift

本文档介绍了数据迁移工具 mxshift。

1 描述

mxshift 工具目前支持以下功能:

  • 全量迁移:以数据库为单位,实现从 Greenplum 5 / Greenplum 6/ MatrixDB 集群并行迁移数据表数据定义语言(DDL)及数据到新的 MatrixDB 集群。目前只有全量迁移模式支持迁移数据表 DDL。
  • 反向迁移:从 MatrixDB 到 Greenplum 5 / Greenplum 6 反向迁移。
  • 增量迁移:通过 WHERE 条件实现数据的增量迁移。
  • 支持的数据分布策略:哈希分布、随机分布、复制表(REPLICATED)、Master-only。

2 参数信息

此部分主要介绍 mxshift 工具相关参数信息。

2.1 配置文件

使用 mxshift 工具需要先准备一个 TOML 文件。以下是配置文件中的详细参数介绍:

Parameter Name Description Required
[database.source] Category
--db-host The host name (Host) or IP address of the Master node in the source cluster. The source database cluster refers to the database cluster from which data needs to be migrated Yes
--db-port Port number of the source database Master Yes
--db-database Source database name Yes
--db-user Source database username Yes
--db-password Source database user password Yes
--install-dir Database installation path (this parameter is supported starting from v4.8.3; for v4.8.2 and earlier versions, use the gphome parameter for configuration) Yes
--hostname-to-ip Specify the IP address corresponding to the hostname. When the hostnames of the source and target databases are the same, use this configuration to ensure routing accuracy (supported since v4.8.3) No
[database.target] category
--db-host The hostname (Host) or IP address of the Master node in the target cluster. The target cluster refers to the database cluster into which data is being migrated Yes
--db-port The port number of the target database Master Yes
--db-database The name of the target database Yes
--db-user The username of the target database Yes
--db-password The password of the target database user Yes
[scope]Category
--mode Run mode. Allows the following values: normal, dryrun, fetch, motion.
normal:Default value, indicating the normal data migration process;
dryrun:Verify the connectivity between the two databases, use simple Data Definition Language (DDL) for verification operations, but no data reading or writing;
fetch:Based on dryrun, read data from the source and send it to the target data node (Segment), but discard it directly without processing. This option is used to verify the source read efficiency and network transfer efficiency from the source to the target;
motion:Based on fetch, parse data on the target database and exchange it to the real target Segment. This option is used to further test the data exchange rate on the target database
No
--compress-method Compression method: gzip / zstd / lz4 / 0, default value is 0, which means no compression.
Currently, enabling any compression algorithm requires that the mxadmin user on each target Segment can log in to each source Segment without a password. It is recommended to test with dryrun after configuration
No
--gphome The absolute path of GPHOME for the source database (This option has been removed from v4.8.3 onwards) Yes
--table-list List of schemas and table names for source tables. If migrating an entire database, leave this parameter empty and mxshift will migrate all non-zero size tables from largest to smallest No
--exclude-table-list List of schemas and table names to be excluded No
--schema-list Source database schema. Priority is given to table-list if it contains specific tables. If not, check if schema-list is configured. If configured, migrate all tables under the schema. If not, migrate all tables under [database.source] db-database No
--exclude-schema-list Exclude schemas of the source database. Tables under these schemas will not be migrated. If a schema is configured in both schema-list and exclude-schema-list, it will be excluded No
[log]Category
--log-level Default value is info, which prints normal progress, error, and warning information. If set to debug, more debugging information will be output, only used for development and troubleshooting No
--no-color Default is false. Set to true to disable color tags No
[controller]Category
--both-way Default is false, which starts with the largest table for fastest overall migration. If set to true, migration proceeds from both largest and smallest tables simultaneously No
--concurrency Number of tables migrated in parallel, default is 5, maximum 10 No
[transfer]Category
--verify Check if row counts match before and after data migration, default is false No
--with-index Whether to migrate indexes, default is false. If disabled, mxshift saves index SQL statements to a file for manual creation. Note: For tables using MARS2 storage engine, indexes must be created for the table to take effect, so DDL migration for MARS2 tables automatically includes indexes No
[ddl]Category (Supported from 4.8.2 onwards)
--enabled Whether to migrate table DDL, default is false. Before migrating data, mxshift prioritizes migrating DDL by database.
Supported database versions for DDL migration:
Greenplum6 -> YMatrix4 / YMatrix5;
YMatrix4 -> YMatrix4 / YMatrix5;
YMatrix5 -> YMatrix5
No
--file-path File path for importing/exporting DDL SQL. Automatically created if non-existent. Exported files can be viewed or modified, but do not alter or delete comments starting with -- DO NOT EDIT., as this may affect import results or cause errors No
--mode DDL processing mode. Options are exec, output, and input. exec is default, executing directly on the target database;output outputs to file and exits without data migration;input reads from a file. For input mode, --file-path must point to a file exported by mxshift No
--only-ddl Whether to migrate only DDL. Default is false. If false, data migration starts after DDL migration. If true, the program exits after DDL migration without moving data No
--skip-resource-queue-and-group Whether to skip resource queue/group migration. Set to true to exclude them during DDL migration No
--skip-table-space Whether to skip tablespace migration. Set to true to exclude tablespace during DDL migration No

If you want to implement incremental data migration through WHERE conditions, you may use the following parameters:

Category Subcategory Subparameters Sub-subcategory Sub-Subparameters Description Required
[transfer]Category [transfer.table-data-where-sql]Category --enabled If `true`, the source database data is filtered using the WHERE condition. Only data that meets the WHERE condition is migrated to achieve incremental migration. This function is turned off by default. Manual identification is required when enabled. `true` No
--global This parameter refers to the `WHERE` expression of SQL. mxshift will append this conditional expression to all tables that are migrated. If the `WHERE` keyword is not included in the table, it is prompted that `SKIPPED` will automatically skip the table No
[[transfer.table-data-where-sql.override]] Category --where represents a WHERE subexpression with a higher priority than the WHERE expression in global and apply only to a table level that filters a specific table. No
[transfer.table-data-where-sql.override.table]Category --schema Schema name to be migrated No
--name Table name to be migrated No

See the configuration file template below for examples of the above parameters usage.

2.2 Command line parameters

The mxshift command line parameters are as follows:

Parameter name Description
--config Print the default content of the configuration file
-a or --all Perform migration directly without confirmation. Completed tables and empty tables are skipped by default
-c or --config-path mxshift tool global configuration file path
-v or --version mxshift tool version
-l or --show-progress Print the progress of the last migration. If it is not currently migrating, the results of the last migration will be displayed; if it is currently migrating, the progress of the current migration will be displayed. Among them, the value of "STATUS" and its meaning are: Not start: not within the migration range; Skip: skip (migrated or empty table); Processing/Interrupted: is being migrated or interrupted during the migration process; Complete: complete migration; Failed: migration failed
-R or --redo If the table has been migrated, do not skip, remigrate (this option will clear all data from the table in the target library and remigrate)

3 Examples

The complete template file is named config.toml. This template can also be obtained by executing mxshift --config. You can execute the vim config.toml command to start editing the file, then directly copy all the contents into the file, and then modify the response configuration items. Or use the sample template we gave below for different usage scenarios. Just choose what you need.

[database]
    [database.source]
        ## Name of database
        db-database= "testdb"
        ## Hostname of database master
        db-host="sdw3"
        ## password of database
        db-password="xxxx"
        ## Port of database master
        db-port=54322
        ## user name of database
        db-user="mxadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
         # db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul 5 2022 15:45:24"
        ## The installation directory of matrixdb(this option is supported starting from version 4.8.3, 4.8.2 and earlier versions need to be specified by gphone)
        install-dir="/usr/local/greenplum-db-6.7.1"
         [[database.source.hostname-to-ip]]
             ## The content within <> should be replaced with actual information and <> should be removed(this option is supported starting from version 4.8.3)
             node-hostname="<mdw>" 
             node-ip="<127.0.0.1>"
        [[database.source.hostname-to-ip]]
             node-hostname="<sdw1>"
             node-ip="<127.0.0.2>"
        [[database.source.hostname-to-ip]]
             node-hostname="<sdw2>"
             node-ip="<127.0.0.3>"

    [database.target]
        ## Name of database
        db-database="destdb"
        ## Hostname of database master
        db-host="172.16.100.32"
        ## password of database
        db-password="yyyy"
        ## Port of database master
        db-port=5432
        ## user name of database
        db-user="mxadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
         # db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul 5 2022 15:45:24"

[scope]
compress_method="lz4"
## (this option is abandoned from version 4.8.3)
# gphome="/usr/local/greenplum-db-6.7.1"
mode="normal"
        [[scope.table-list]]
                schema="test_schema_1"
                name="table_001"
        [[scope.table-list]]
                schema="test_schema_2"
                name="table_002"     
        [[scope.exclude-table-list]]
                schema="test_schema_3"
                name="table_003"       
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]

[log]
log-level="info"
## Print log without color.
# no-color=false

[controller]
both-way=true
concurrency=5

[transfer]
verify=true
# with-index=true
[transfer.table-data-where-sql]
enabled=false
global="txdate >= '2022-10-01' AND batchnum >= 100000000"
[[transfer.table-data-where-sql.override]]
        where="abc > 10"
        [transfer.table-data-where-sql.override.table]
        schema="test_schema_1"
        name="table_001"
[[transfer.table-data-where-sql.override]]
        where="tag != 'aabbcc' AND ts > '2022-01-01'"
        [transfer.table-data-where-sql.override.table]
        schema="test_schema_2"
        name="another_table"

[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
skip-table-space=true
        [[ddl.replace]]
        ## Only applicable for the case of migration from Greenplum to YMatrix
                category="role"
                [[ddl.replace.pairs]]
                        old="gpadmin"
                        new="mxadmin"      

After you have prepared the TOML file, execute the following command to start mxshift for migration.

$ mxshift -c config_path.toml

3.1 Full migration

[database]
        [database.source]
        ## Name of database
        db-database= "testdb"
        ## Hostname of database master
        db-host="sdw3"
        ## password of database
        db-password="xxxx"
        ## Port of database master
        db-port=54322
        ## user name of database
        db-user="gpadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
         # db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul 5 2022 15:45:24"
        ## The installation directory of matrixdb(this option is supported starting from version 4.8.3, 4.8.2 and earlier versions need to be specified by gphone)
        install-dir="/usr/local/greenplum-db-6.7.1"
        [[database.source.hostname-to-ip]]
             ## The content within <> should be replaced with actual information and <> should be removed(this option is supported starting from version 4.8.3)
            node-hostname="<mdw>" 
             node-ip="<127.0.0.1>"
        [[database.source.hostname-to-ip]]
             node-hostname="<sdw1>"
             node-ip="<127.0.0.2>"
        [[database.source.hostname-to-ip]]
             node-hostname="<sdw2>"
             node-ip="<127.0.0.3>"

        [database.target]
        ## Name of database
        db-database="destdb"
        ## Hostname of database master
        db-host="172.16.100.32"
        ## password of database
        db-password="yyyy"
        ## Port of database master
        db-port=5432
        ## user name of database
        db-user="mxadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
         # db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul 5 2022 15:45:24"

[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd
compress-method="lz4"
## (this option is abandoned from version 4.8.3)
# gphome="/usr/local/greenplum-db-6.7.1"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
        [[scope.table-list]]
                schema="test_schema_1"
                name="table_001"
        [[scope.table-list]]
                schema="test_schema_2"
                name="table_002"   
        [[scope.exclude-table-list]]
                schema="test_schema_3"
                name="table_003"  
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]

[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false

[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=5

[transfer]
## Verity the number of record of every table
verify=true
with-index=true

[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
# skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
# skip-table-space=true
        [[ddl.replace]]
        ## Only applicable for the case of migration from Greenplum to YMatrix
                category="role"
                [[ddl.replace.pairs]]
                        old="gpadmin"
                        new="mxadmin"

After preparing the TOML file, execute the following command to start mxshift for full migration.

$ mxshift -c config_path_full.toml

3.2 Incremental Migration

Implement incremental data migration through WHERE conditions:

[database]
        [database.source]
        ## Name of database
        db-database= "testdb"
        ## Hostname of database master
        db-host="sdw3"
        ## password of database
        db-password="xxxx"
        ## Port of database master
        db-port=54322
        ## user name of database
        db-user="gpadmin"

        [database.target]
        ## Name of database
        db-database="destdb"
        ## Hostname of database master
        db-host="172.16.100.32"
        ## password of database
        db-password="yyyy"
        ## Port of database master
        db-port=5432
        ## user name of database
        db-user="mxadmin"

[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd
compress-method="lz4"
## The installation directory of matrixdb
gphome="/usr/local/greenplum-db-6.7.1"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
        [[scope.table-list]]
                schema=......
                name=......
        [[scope.table-list]]
        ......   
        [[scope.exclude-table-list]]
        ......      
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]

[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false

[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=5

[transfer]
## Verity the number of record of every table
verify=true
[transfer.table-data-where-sql]
enabled=true
global="txdate >= '2022-10-01' AND batchnum >= 100000000"
[[transfer.table-data-where-sql.override]]
        where="abc > 10"
        [transfer.table-data-where-sql.override.table]
        schema="test_schema_1"
        name="table_001"
[[transfer.table-data-where-sql.override]]
        where="tag != 'aabbcc' AND ts > '2022-01-01'"
        [transfer.table-data-where-sql.override.table]
        schema="test_schema_2"
        name="another_table"

After you have prepared the TOML file, execute the following command to start mxshift for incremental migration.

$ mxshift -c config_path_incremental.toml

3.3 Replicated Table

If you are migrating a database or all replication tables under a schema, you do not need to make additional configurations in config_path.toml. mxshfit will automatically determine which tables are replication tables and execute corresponding migration strategies to prevent repeated migration of data. You only need to configure schema and name in scope.table-list.

   [[scope.table-list]]
                schema=“public1”
                name=“table_replicated1”
   [[scope.table-list]]
                schema=“public2”
                name=“table_replicated2”

After you have prepared the TOML file, execute the following command to start mxshift for copy table migration.

$ mxshift -c config_path_replicated.toml

Notes! For an introduction to the data migration ideas and complete steps, please see [Data Migration] (/doc/4.8/maintain/migrate/overview)