mxshift

This document describes the data migration tool mxshift.

1 Description

The mxshift tool currently supports the following features:

  • Full migration: Migrate DDL and data in parallel from a Greenplum 4.3.X/5/6 or YMatrix cluster to a new YMatrix cluster, at the database level.
  • Incremental migration: Migrate DDL and data of newly added tables in parallel from a Greenplum 5/6 or YMatrix cluster to a new YMatrix cluster, at the table level.
  • Conditional migration: Perform data migration based on specific conditions using the WHERE clause.
  • Supported data distribution strategies: Hash distribution, random distribution, replicated tables (REPLICATED), and Master-only tables.

2 Quick Start

The complete template file is named config_template.toml. This template can be obtained by running mxshift --dump-config.

Below are example templates for different use cases. Use them as references and modify the configuration using the mxshift command accordingly.

2.1 Full Migration

[database]
        [database.source]
        ## Name of database
        db-database= "testdb"
        ## Hostname of database master
        db-host="sdw3"
        ## password of database
        db-password="xxxx"
        ## Port of database master
        db-port=54322
        ## user name of database
        db-user="gpadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output'
        # db-version="PostgreSQL 12 (MatrixDB 5.2.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun  5 2023 15:45:24"
        ## The installation directory of database
        install-dir="/usr/local/greenplum-db-6.7.1"
        [[database.source.hostname-to-ip]]
             ## The content within <> should be replaced with actual information and <> should be removed
             node-hostname="<mdw>" 
             node-ip="<127.0.0.1>"
        [[database.source.hostname-to-ip]]
             node-hostname="<sdw1>"
             node-ip="<127.0.0.2>"
        [[database.source.hostname-to-ip]]
             node-hostname="<sdw2>"
             node-ip="<127.0.0.3>"

        [database.target]
        ## Name of database
        db-database="destdb"
        ## Hostname of database master
        db-host="172.16.100.32"
        ## password of database
        db-password="yyyy"
        ## Port of database master
        db-port=5432
        ## user name of database
        db-user="mxadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output' */
        # db-version="PostgreSQL 12 (MatrixDB 5.2.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun  5 2023 15:45:24"

[scope]
## Whether to disable data incremental migration, by default, it is true.
# disable-data-increment=true
disable-connector=false
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd        
compress-method="lz4"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database        
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
        [[scope.table-list]]
                schema="test_schema_1"
                name="table_001"
        [[scope.table-list]]
                schema="test_schema_2"
                name="table_002"
        [[scope.exclude-table-list]]
                schema="test_schema_3"
                name="table_003"
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]

[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false

[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=3

[transfer]
## Verity the number of record of every table
verify=true
with-index=true
## Whether to disable transfer Data, by default, it is false.
# disabled-data-transfer=false
## Table with size under this threshold are categorized as "small",
## transfer between masters instead of segments to avoid creating external table.
# small-table-mb=32

[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
# skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
# skip-table-space=true
        [[ddl.replace]]
        ## Only applicable for the case of migration from Greenplum to YMatrix
                category="role"
                [[ddl.replace.pairs]]
                        old="gpadmin"
                        new="mxadmin"
## Whether to disable ddl incremental migration, by default, it is true.
# disable-ddl-increment=true
## DDL transfer will stop and exit when executing sql encounter errors, unless the type of DDL is set to 'ignore error'.Types of DDL are:
## "all","collation","default acl","extension","external table","function","index","language",
## "namespace","operator","resource","role","table","tablespace","type","view"
# ignore-error=[]

[verify]
## Whether to enable data verify, by default, it is false.
# enabled=false
## mode for verifying data consistency between source and target database,
## and value restricted to simple-count/column-count/length-sum/hash-bit-bloom.
## 'simple-count' for counting number of rows
## 'column-count' for counting not-null value of a certain column
## 'length-sum' for summing the length of some columns(e.g. column of type text, xml, json, varchar)
## 'hash-bit-bloom' for calculating the hash for one or more columns, and compare it's binary content
##  (except for column of floating-point type, the result of witch can be affected by software/hardware)
# mode="simple-count"
## Columns used in verification, will randomly select columns if not set
# columns=[]
## Maximum number of randomly selected columns used in verification
# max-column-count=3
# [[verify.table-list]]
#    schema="test_schema_1"
#    name="table_001"
# [[verify.exclude-table-list]]
#    schema="test_schema_3"
#    name="table_003"
# verify.schema-list=["test_schema_1"]
# verify.exclude-schema-list=["test_schema_5"]

# [[verify.override]]
#    mode="length-sum"
#    max-column-count=3
#    columns=[]
#    [[verify.override.table-list]]
#      schema="test_schema_3"
#      name="table_003"
#    verify.override.schema-list=[]

# [[verify.override]]
#    mode="column-count"
#    max-column-count=2
#    columns=[]
#    [[verify.override.table-list]]
#      schema="test_schema_1"
#      name="table_001"
#    verify.override.schema-list=[]

After preparing the TOML file, run the following command to start full migration with mxshift:

$ mxshift -c config_path_full.toml

2.2 Incremental Migration

[database]
        [database.source]
        ## Name of database
        db-database= "testdb"
        ## Hostname of database master
        db-host="sdw3"
        ## password of database
        db-password="xxxx"
        ## Port of database master
        db-port=54322
        ## user name of database
        db-user="gpadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output' */
        # db-version="PostgreSQL 12 (MatrixDB 5.2.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun  5 2023 15:45:24"
        ## The installation directory of database
        install-dir="/usr/local/greenplum-db-6.7.1"
        [[database.source.hostname-to-ip]]
             ## The content within <> should be replaced with actual information and <> should be removed
             node-hostname="<mdw>" 
             node-ip="<127.0.0.1>"
        [[database.source.hostname-to-ip]]
             node-hostname="<sdw1>"
             node-ip="<127.0.0.2>"
        [[database.source.hostname-to-ip]]
             node-hostname="<sdw2>"
             node-ip="<127.0.0.3>"

        [database.target]
        ## Name of database
        db-database="destdb"
        ## Hostname of database master
        db-host="172.16.100.32"
        ## password of database
        db-password="yyyy"
        ## Port of database master
        db-port=5432
        ## user name of database
        db-user="mxadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output' */
        # db-version="PostgreSQL 12 (MatrixDB 5.2.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun  5 2023 15:45:24"

[scope]
disable-connector=false
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd        
compress-method="lz4"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database        
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Whether to disable data incremental migration, by default, it is true.
disable-data-increment=false
        [[scope.table-list]]
                schema="test_schema_1"
                name="table_001"
        [[scope.table-list]]
                schema="test_schema_2"
                name="table_002"

        [[scope.exclude-table-list]]
                schema="test_schema_3"
                name="table_003"        
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]

[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false

[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=3

[transfer]
## Verity the number of record of every table
verify=true
with-index=true
## Whether to disable transfer Data, by default, it is false.
# disabled-data-transfer=false
## Table with size under this threshold are categorized as "small",
## transfer between masters instead of segments to avoid creating external table.
# small-table-mb=32

[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
# skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
# skip-table-space=true
        [[ddl.replace]]
        ## Only applicable for the case of migration from Greenplum to YMatrix
                category="role"
                [[ddl.replace.pairs]]
                        old="gpadmin"
                        new="mxadmin"
## Whether to disable ddl incremental migration, by default, it is true.
disable-ddl-increment=false
## DDL transfer will stop and exit when executing sql encounter errors, unless the type of DDL is set to 'ignore error'.Types of DDL are:
## "all","collation","default acl","extension","external table","function","index","language",
## "namespace","operator","resource","role","table","tablespace","type","view"
# ignore-error=[]

[verify]
## Whether to enable data verify, by default, it is false.
# enabled=false
## mode for verifying data consistency between source and target database,
## and value restricted to simple-count/column-count/length-sum/hash-bit-bloom.
## 'simple-count' for counting number of rows
## 'column-count' for counting not-null value of a certain column
## 'length-sum' for summing the length of some columns(e.g. column of type text, xml, json, varchar)
## 'hash-bit-bloom' for calculating the hash for one or more columns, and compare it's binary content
##  (except for column of floating-point type, the result of witch can be affected by software/hardware)
# mode="simple-count"
## Columns used in verification, will randomly select columns if not set
# columns=[]
## Maximum number of randomly selected columns used in verification
# max-column-count=3
# [[verify.table-list]]
#    schema="test_schema_1"
#    name="table_001"
# [[verify.exclude-table-list]]
#    schema="test_schema_3"
#    name="table_003"
# verify.schema-list=["test_schema_1"]
# verify.exclude-schema-list=["test_schema_5"]

# [[verify.override]]
#    mode="length-sum"
#    max-column-count=3
#    columns=[]
#    [[verify.override.table-list]]
#      schema="test_schema_3"
#      name="table_003"
#    verify.override.schema-list=[]

# [[verify.override]]
#    mode="column-count"
#    max-column-count=2
#    columns=[]
#    [[verify.override.table-list]]
#      schema="test_schema_1"
#      name="table_001"
#    verify.override.schema-list=[]

After preparing the TOML file, run the following command to start incremental migration with mxshift:

$ mxshift -c config_path_incremental.toml

2.3 Conditional Migration

Use the WHERE clause to migrate only data that meets specific conditions, or to exclude expired data from migration:

[database]
        [database.source]
        ##

--- SPLIT ---

Name of database  
db-database= "testdb"  
## Hostname of database master  
db-host="sdw3"  
## password of database  
db-password="xxxx"  
## Port of database master  
db-port=54322  
## user name of database  
db-user="gpadmin"  
## The installation directory of database  
install-dir="/usr/local/greenplum-db-6.7.1"

[database.target]  
## Name of database  
db-database="destdb"  
## Hostname of database master  
db-host="172.16.100.32"  
## password of database  
db-password="yyyy"  
## Port of database master  
db-port=5432  
## user name of database  
db-user="mxadmin"

[scope]  
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd  
compress-method="lz4"  
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.  
##dryrun for only executing ddl, not transferring data  
##fetch for fetching data from source and abandon  
##motion for fetching data from source, redistributing and finally abandon  
mode="normal"  
## Sql for select segment information from source database  
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"  
## Sql for select segment information from target database  
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"  
        [[scope.table-list]]  
                schema=......  
                name=......  
        [[scope.table-list]]  
        ......  

        [[scope.exclude-table-list]]  
        ......      
schema-list=["test_schema_1", "test_schema_2"]  
exclude-schema-list=["test_schema_5", "test_schema_8"]  
disable-connector=false

[log]  
## The log level, value restricted to: debug/verbose/info.  
log-level="info"  
## Print log without color.  
# no-color=false

[controller]  
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table  
both-way=true  
## The number of table transferred at the same time  
concurrency=3

[transfer]  
## Verity the number of record of every table  
verify=true  
## Table with size under this threshold are categorized as "small",  
## transfer between masters instead of segments to avoid creating external table.  
# small-table-mb=32  
[transfer.table-data-where-sql]  
enabled=true  
global="txdate >= '2022-10-01' AND batchnum >= 100000000"  
[[transfer.table-data-where-sql.override]]  
        where="abc > 10"  
        [transfer.table-data-where-sql.override.table]  
        schema="test_schema_1"  
        name="table_001"  
[[transfer.table-data-where-sql.override]]  
        where="tag != 'aabbcc' AND ts > '2022-01-01'"  
        [transfer.table-data-where-sql.override.table]  
        schema="test_schema_2"  
        name="another_table"

[verify]  
## Whether to enable data verify, by default, it is false.  
# enabled=false  
## mode for verifying data consistency between source and target database,  
## and value restricted to simple-count/column-count/length-sum/hash-bit-bloom.  
## 'simple-count' for counting number of rows  
## 'column-count' for counting not-null value of a certain column  
## 'length-sum' for summing the length of some columns(e.g. column of type text, xml, json, varchar)  
## 'hash-bit-bloom' for calculating the hash for one or more columns, and compare it's binary content  
##  (except for column of floating-point type, the result of witch can be affected by software/hardware)  
# mode="simple-count"  
## Columns used in verification, will randomly select columns if not set  
# columns=[]  
## Maximum number of randomly selected columns used in verification  
# max-column-count=3  
# [[verify.table-list]]  
#    schema="test_schema_1"  
#    name="table_001"  
# [[verify.exclude-table-list]]  
#    schema="test_schema_3"  
#    name="table_003"  
# verify.schema-list=["test_schema_1"]  
# verify.exclude-schema-list=["test_schema_5"]

# [[verify.override]]  
#    mode="length-sum"  
#    max-column-count=3  
#    columns=[]  
#    [[verify.override.table-list]]  
#      schema="test_schema_3"  
#      name="table_003"  
#    verify.override.schema-list=[]

# [[verify.override]]  
#    mode="column-count"  
#    max-column-count=2  
#    columns=[]  
#    [[verify.override.table-list]]  
#      schema="test_schema_1"  
#      name="table_001"  
#    verify.override.schema-list=[]  

Prepare the TOML file, then run the following command to start mxshift for conditional migration.

$ mxshift -c config_path_conditional.toml  

2.4 Migrating Replicated Tables

To migrate all replicated tables in a database or a schema, no additional configuration is required in config_path.toml. mxshift automatically identifies replicated tables and applies the appropriate migration strategy to prevent data duplication. You only need to configure scope.table-list with schema and name.

   [[scope.table-list]]  
                schema=“public1”  
                name=“table_replicated1”  
   [[scope.table-list]]  
                schema=“public2”  
                name=“table_replicated2”  

After preparing the TOML file, run the following command to start mxshift for replicating table migration.

$ mxshift -c config_path_replicated.toml  

Note!
For an overview of data migration strategies and complete steps, see Data Migration.

3 Feature Tips

3.1 Import/Export Table Metadata

For data migration, mxshift first queries the catalog to obtain metadata of the tables to be migrated, then proceeds with the data transfer.

As database usage increases, catalog bloat can significantly increase query latency. To speed up debugging and reduce service downtime during migration, mxshift provides command-line options to export table metadata and reuse it in subsequent migrations:

Export table metadata to mxshift_meta_file and exit

$ mxshift -c <migration_config_file> --write-table-meta-file mxshift_meta_file  

Import table metadata from mxshift_meta_file and proceed with data migration

$ mxshift -c <migration_config_file> --read-table-meta-file mxshift_meta_file  

Note!
Exported metadata should not be used long-term. Regenerate the metadata whenever the set of tables to migrate changes or their schema is altered.

3.2 Import/Export Table DDL

Configure the mxshift configuration file as follows:

[ddl]  
enabled=true  
file-path="/tmp/mxshift.sql"  
mode="output"  
  • file-path: File path for importing/exporting DDL SQL, see appendix below. Do not modify or delete any comments starting with -- DO NOT EDIT..
  • mode: DDL processing mode, one of exec, output, or input, meaning:
    • exec: Default. Execute directly on the target database.
    • output: Output to file. After exporting DDL, mxshift exits without performing data migration.
    • input: Read from file.

If the target database is unreachable when using output, configure database.target.db-version as shown below to ensure successful DDL export:

[database]

    [database.target]  
    ...  
    ## Version of database (Use output of 'SELECT version();'). Required only when  
    ##         1. Source database is unreachable, 'ddl.only-ddl' is enabled, and 'ddl.mode' is 'input'  
    ##         2. Target database is unreachable, and 'ddl.mode' is 'output'  
    db-version="PostgreSQL 12 (MatrixDB 5.2.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul  5 2022 15:45:24"  

If the source database is unreachable when using input, configure database.source.db-version and ddl.only-ddl as shown below to ensure successful import:

[database]

    [database.target]  
    ## Version of database (Use output of 'SELECT version();'). Required only when  
    ##         1. Source database is unreachable, 'ddl.only-ddl' is enabled, and 'ddl.mode' is 'input'  
    ##         2. Target database is unreachable, and 'ddl.mode' is 'output'  
    db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul  5 2022 15:45:24"

[ddl]  
only-ddl=true  

3.3 Dynamically Adjusting Concurrency

mxshift supports runtime concurrency adjustment. Follow these steps:

  1. Modify the controller.concurrency parameter in the configuration file (used when starting mxshift):
[controller]  
## The number of tables transferred at the same time  
concurrency=xxx  
  1. Obtain the current process ID of mxshift

    Run ps aux | grep mxshift in the terminal and note the number in the second column ({{mxshift PID}})

  2. Send SIGUSR2 signal to the mxshift process to reload the configuration and update concurrency

    Run kill -SIGUSR2 {{mxshift PID}} in the terminal

3.4 Data Consistency Verification

3.4.1 Decoupling Legacy Verification from Data Migration

In versions prior to v5.2.1, the [transfer] section contained with-index and verify to control whether indexes were migrated and whether row counts were verified. However, index migration and legacy verification (i.e., row count check only) were tightly coupled with data migration—they could not run independently.

Starting from v5.2.1, index migration and legacy verification can be executed separately. A new configuration parameter disabled-data-transfer has been added under the [transfer] section to skip data transfer. Examples:

  1. Migrate data and indexes:
[transfer]  
## Whether to disable transfer Data, by default, it is false.  
disabled-data-transfer=false  
## Transfer indexes of the tables, by default, it is false.  
with-index=true  
  1. Migrate data and verify row counts:
[transfer]  
## Whether to disable transfer Data, by default, it is false.  
disabled-data-transfer=false  
## Verify the number of record of every table, by default, it is disabled.  
verify=true  
  1. Migrate data, indexes, and verify row counts:
[transfer]  
## Whether to disable transfer Data, by default, it is false.  
disabled-data-transfer=false  
## Transfer indexes of the tables, by default, it is false.  
with-index=true  
## Verify the number of record of every table, by default, it is disabled.  
verify=true  
  1. Migrate data only (no indexes, no verification):
[transfer]  
## Whether to disable transfer Data, by default, it is false.  
disabled-data-transfer=false  
## Transfer indexes of the tables, by default, it is false.  
with-index=false  
## Verify the number of record of every table, by default, it is disabled.  
verify=false  
  1. Migrate indexes only (no data transfer):
[transfer]  
## Whether to disable transfer Data, by default, it is false.  
disabled-data-transfer=true  
## Transfer indexes of the tables, by default, it is false.  
with-index=true  
  1. Verify row counts only (no data transfer):
[transfer]  
## Whether to disable transfer Data, by default, it is false.  
disabled-data-transfer=true  
## Verify the number of record of every table, by default, it is disabled.  
verify=true  

3.4.2 v5.2.1 Data Verification

Starting from v5.2.1, YMatrix mxshift supports a new data verification algorithm configurable via the [verify] section.

See the Appendix for detailed global configuration parameters, and refer to the Full Migration Configuration File above for a complete example of the [verify] section. Below we focus on scope configuration.

[verify]  
# [[verify.table-list]]  
#    schema="test_schema_1"  
#    name="table_001"  
# [[verify.exclude-table-list]]  
#    schema="test_schema_3"  
#    name="table_003"  
# verify.schema-list=["test_schema_1"]  
# verify.exclude-schema-list=["test_schema_5"]

# [[verify.override]]  
#    mode="length-sum"  
#    max-column-count=3  
#    columns=[]  
#    [[verify.override.table-list]]  
#      schema="test_schema_3"  
#      name="table_003"  
#    verify.override.schema-list=[]

# [[verify.override]]  
#    mode="column-count"  
#    max-column-count=2  
#    columns=[]  
#    [[verify.override.table-list]]  
#      schema="test_schema_1"  
#      name="table_001"  
#    verify.override.schema-list=[]  

Scope-related configurations are defined in four subsections: [verify.table-list], [verify.exclude-table-list], [verify.schema-list], and [verify.exclude-schema-list].

Verification priority order:

  1. If specific tables are listed in table-list, only those tables in table-list are verified.

  2. If no tables are specified, check whether schemas are listed in schema-list.

  3. If schemas are specified, verify all tables within those schemas.

  4. If neither tables nor schemas are specified, verify all tables from the most recent transfer.

  • verify.table-list: List of tables included in v5.2.1 data verification.
  • verify.exclude-table-list: List of tables excluded from v5.2.1 data verification.
  • verify.schema-list: List of schemas included in v5.2.1 data verification.
  • verify.exclude-schema-list: List of schemas excluded from v5.2.1 data verification.

The [verify.override] subsection overrides global verification settings. Configurations under this block apply to tables within the specified scope.

Appendix: Parameter Reference

This section describes parameters supported by the mxshift tool.

Configuration File

To use the mxshift tool, you must first prepare a TOML configuration file. Below is a detailed description of the parameters in the configuration file:

Parameter Description Required
[database.source]
--db-host Hostname (Host) or IP address of the Master node in the source cluster. The source database cluster refers to the cluster from which data will be migrated out. Yes
--db-port Port number of the Master node in the source database. Yes
--db-database Name of the source database. Yes
--db-user Username for the source database. Yes
--db-password Password for the source database user. Yes
--install-dir Database installation directory. Yes
--hostname-to-ip Maps hostnames to IP addresses. Use this setting when the source and target clusters have identical hostnames to ensure accurate routing. No
[database.target]
--db-host Hostname (Host) or IP address of the Master node in the target cluster. The target cluster is the destination for data migration. Yes
--db-port Port number of the Master node in the target database. Yes
--db-database Name of the target database. Yes
--db-user Username for the target database. Yes
--db-password Password for the target database user. Yes
[scope]
--mode Operation mode. Allowed values: normal, dryrun, fetch, motion.
normal: Default value; performs normal data migration.
dryrun: Verifies connectivity between source and target databases using simple DDL operations, without reading or writing data.
fetch: Based on dryrun, reads data from the source and sends it to the target Segment nodes but discards it. Used to test source read performance and network transfer speed.
motion: Extends fetch by parsing data at the target and redistributing it to the correct target Segments. Tests the target database's data redistribution rate.
No
--compress-method Compression method: gzip / zstd / lz4 / 0. Default is 0 (no compression).
If scope.disable-connector=true, enabling any compression requires that the mxadmin user on each target Segment can SSH passwordlessly to every source Segment. It is recommended to test with dryrun after configuration.
No
--table-list List of schema and table names from the source. For partitioned tables, specify the parent partitioned table. If empty, mxshift migrates all non-zero-sized tables in descending order of size. No
--exclude-table-list List of schema and table names to exclude from migration. No
--schema-list Source database schemas. Priority: if table-list specifies specific tables, only those are migrated. Otherwise, if table-list specifies schemas, all tables under those schemas are migrated. If neither is set, all tables under schema-list [database.source] are migrated. No
--exclude-schema-list Schemas to exclude from migration. Tables under these schemas will not be migrated. If a schema appears in both db-database and schema-list, it will be excluded. No
--disable-data-increment Whether to disable incremental data migration. Default is exclude-schema-list (disabled). When enabled, mxshift compares the last modification time of tables in the source with the previous run time and re-migrates only modified tables. Note: On first enable, all tables specified in the config are migrated. No
--disable-connector Whether to add whitelists for all nodes in the target cluster. Default is true (whitelist required only on the source cluster’s Master for the target Master). Note: If the source is Greenplum, only versions 4.3.17 and above are supported. No
[log]
--log-level Default is false, printing progress, errors, and warnings. Set to info for additional debug output, useful for development and troubleshooting. No
--no-color Default is debug. Set to false to disable colored output. No
[controller]
--both-way Default is true; starts with the largest table to complete full migration as quickly as possible. If set to false, migration proceeds simultaneously from both the largest and smallest tables. No
--concurrency Number of tables migrated in parallel. Default is 3, maximum 10. No
[transfer]
--verify Verify row count before and after migration. Default is both-way. No
--with-index Whether to migrate indexes. Default is true (not migrated automatically). Note: For MARS2 storage engine tables, index tables are required and thus DDL migration includes indexes regardless of this option. No
--disabled-data-transfer Skip actual data migration, only migrate indexes and verify row counts. Decouples migration and verification. Default is false (data migration is not skipped). No
--small-table-mb Tables smaller than this value (in MB) are considered small tables and optimized for transfer (small tables bypass Segment-to-Segment transfer and go directly through the Master). Default is false. Set to false to disable this optimization. No
[ddl]
--enabled Whether to migrate table DDLs. Default is 32MB. mxshift migrates DDLs by database before migrating data.
Supported DDL migration versions:
Greenplum 4.3.X/5/6 → YMatrix 4/5;
YMatrix 4/5 → YMatrix 4/5.
No
--file-path File path for importing/exporting DDL SQL. If the file does not exist, it is created. Existing content is cleared before writing. You may view or edit the exported file, but do not modify or delete any comment starting with "– DO NOT EDIT.", as this may cause import failure. No
--mode DDL handling mode. Options: -1, false, exec. output is default: execute directly on target database. input: Output to file; exits after export, no data migration. exec: Read from file. This mode requires output to point to an mxshift-exported file, otherwise mxshift cannot recognize it. SPEC: If the target database is unreachable in input mode, add --file-path under [database.target]. If the source is unreachable in output mode, set --db-version in [ddl] to input and add --only-ddl under [database.source]. No
--only-ddl Whether to migrate only DDL. Default is true. If set to --db-version, data migration follows DDL migration. If set to false, the tool exits after DDL migration without migrating data. No
--skip-resource-queue-and-group Whether to skip resource queues (groups) during DDL migration. When set to false, resource groups and queues are skipped. No
--skip-table-space Whether to skip tablespaces during DDL migration. When set to true, tablespaces are skipped. No

--disable-ddl-increment
Whether to disable incremental DDL migration. Default is skip-resource-queue-and-group=true (disabled). When enabled, mxshift parses catalogs of source and target databases and migrates only missing DDLs. This adds comparison steps and slightly reduces performance compared to disabled mode. No
--ignore-error Whether to ignore specific DDL errors and continue execution. Without this option, any DDL error stops migration. With it, specified error types are logged but do not halt execution.
Default is empty (no errors ignored).
skip-table-space=true means all error types.
Multiple types supported, comma-separated, e.g., ["extension","function"].
No
[verify]
--enabled Enable v5.2.1-style data verification (multiple modes available). Default is true. No
--mode Verification mode. Default is all. Supported modes: false, simple-count, simple-count, column-count.
length-sum: Same as legacy verification (hash-bit-bloom), checks row count consistency.
simple-count: Compares non-null value counts in a column. If the configured column has NOT NULL constraints or DEFAULT values, another suitable column is randomly selected.
transfer.verify: Computes sum of lengths of one or more columns. Supported column types: column-count — including length-sum, 字符串类型, text, xml, json, jsonb, jsonpath.
If the specified column is invalid, another valid one is chosen.
varchar: Computes hash values of one or more columns and compares binary characteristics. Column must support casting to varbit. Invalid columns trigger random selection of valid ones.
No
--columns Column names used in v5.2.1 data verification. No
--max-column-count Maximum number of columns used in v5.2.1 verification. Used when selecting columns randomly. Default is hash-bit-bloom. No

For conditional migration using WHERE clauses, use the following parameters:

Category Subcategory Subparameter Sub-subcategory Sub-subparameter Description Required
[transfer] [transfer.table-data-where-sql] --enabled If set to `true`, applies WHERE conditions to filter source data, migrating only matching rows for incremental migration. Disabled by default; must be manually enabled via `true`. No
--global SQL `WHERE` expression applied to all tables in the migration. If a table does not contain the `WHERE` keyword, `SKIPPED` automatically skips the table. No
[[transfer.table-data-where-sql.override]] --where WHERE sub-expression with higher priority than global. Applies table-level filtering for specific tables. No
[transfer.table-data-where-sql.override.table] --schema Schema name of the table to migrate. No
--name Table name to migrate. No

Command-Line Options

The mxshift command-line options are as follows:

Option Description
--config Print default configuration file content
-c / --config-path Path to the mxshift global configuration file
-a / --all Execute migration without confirmation. Skips already migrated and empty tables by default
-v / --version Show mxshift tool version
-l / --show-progress Display progress of the most recent migration. If no migration is ongoing, shows last result. If active, shows current progress. STATUS values: Not start — not in migration scope; Skip — skipped (already migrated or empty); Processing/Interrupted — currently migrating or interrupted; Complete — migration completed; Failed — migration failed
-R / --redo Re-migrate tables even if previously migrated (clears existing data in target tables before re-migration)
--write-table-meta-file Export migration table metadata to mxshift_meta_file and exit
--read-table-meta-file Import migration table metadata from mxshift_meta_file and proceed with data migration