mxshift

This document describes the data migration tool mxshift.

1 Description

The mxshift tool currently supports the following features:

  • Full migration: Migrate DDL and data in parallel from a Greenplum 4.3.X/5/6 or YMatrix cluster to a new YMatrix cluster, at the database level.
  • Incremental migration: Migrate DDL and data in parallel for newly added tables from a Greenplum 5/6 or YMatrix cluster to a new YMatrix cluster, at the table level.
  • Conditional migration: Perform data migration based on conditions specified via the WHERE clause.
  • Supported data distribution strategies: Hash distribution, random distribution, replicated tables (REPLICATED), and Master-only tables.

Note!
All features of the mxshift tool require execution under superuser privileges.
Without superuser privileges, the following limitations apply: data compression is not supported; incremental migration is not supported; fetch mode is not supported.
Additionally, during migration, you must manually add the target host into the pg_hba.conf file on all source nodes.

2 Getting Started

The complete template file is named config.toml. This template can be obtained by running mxshift --config.

Below are example templates for different use cases. Use them as references and modify the configuration using the vim config.toml command according to your environment.

2.1 Full Migration

[database]
        [database.source]
        ## Name of database
        db-database= "testdb"
        ## Hostname of database master
        db-host="sdw3"
        ## password of database
        db-password="xxxx"
        ## Port of database master
        db-port=54322
        ## user name of database
        db-user="gpadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output'
        # db-version="PostgreSQL 12 (MatrixDB 5.2.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun  5 2023 15:45:24"
        ## The installation directory of database
        install-dir="/usr/local/greenplum-db-6.7.1"
        [[database.source.hostname-to-ip]]
             ## The content within <> should be replaced with actual information and <> should be removed
             node-hostname="<mdw>" 
             node-ip="<127.0.0.1>"
        [[database.source.hostname-to-ip]]
             node-hostname="<sdw1>"
             node-ip="<127.0.0.2>"
        [[database.source.hostname-to-ip]]
             node-hostname="<sdw2>"
             node-ip="<127.0.0.3>"

        [database.target]
        ## Name of database
        db-database="destdb"
        ## Hostname of database master
        db-host="172.16.100.32"
        ## password of database
        db-password="yyyy"
        ## Port of database master
        db-port=5432
        ## user name of database
        db-user="mxadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output' */
        # db-version="PostgreSQL 12 (MatrixDB 5.2.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun  5 2023 15:45:24"

[scope]
## Whether to disable data incremental migration, by default, it is true.
# disable-data-increment=true
disable-connector=false
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd        
compress-method="lz4"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database        
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
        [[scope.table-list]]
                schema="test_schema_1"
                name="table_001"
        [[scope.table-list]]
                schema="test_schema_2"
                name="table_002"
        [[scope.exclude-table-list]]
                schema="test_schema_3"
                name="table_003"
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]

[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false

[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=3

[transfer]
## Verity the number of record of every table
verify=true
with-index=true
## Whether to disable transfer Data, by default, it is false.
# disabled-data-transfer=false
## Table with size under this threshold are categorized as "small",
## transfer between masters instead of segments to avoid creating external table.
# small-table-mb=32

[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
# skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
# skip-table-space=true
        [[ddl.replace]]
        ## Only applicable for the case of migration from Greenplum to YMatrix
                category="role"
                [[ddl.replace.pairs]]
                        old="gpadmin"
                        new="mxadmin"
## Whether to disable ddl incremental migration, by default, it is true.
# disable-ddl-increment=true
## DDL transfer will stop and exit when executing sql encounter errors, unless the type of DDL is set to 'ignore error'.Types of DDL are:
## "all","collation","default acl","extension","external table","function","index","language",
## "namespace","operator","resource","role","table","tablespace","type","view"
# ignore-error=[]

[verify]
## Whether to enable data verify, by default, it is false.
# enabled=false
## mode for verifying data consistency between source and target database,
## and value restricted to simple-count/column-count/length-sum/hash-bit-bloom.
## 'simple-count' for counting number of rows
## 'column-count' for counting not-null value of a certain column
## 'length-sum' for summing the length of some columns(e.g. column of type text, xml, json, varchar)
## 'hash-bit-bloom' for calculating the hash for one or more columns, and compare it's binary content
##  (except for column of floating-point type, the result of witch can be affected by software/hardware)
# mode="simple-count"
## Columns used in verification, will randomly select columns if not set
# columns=[]
## Maximum number of randomly selected columns used in verification
# max-column-count=3
# [[verify.table-list]]
#    schema="test_schema_1"
#    name="table_001"
# [[verify.exclude-table-list]]
#    schema="test_schema_3"
#    name="table_003"
# verify.schema-list=["test_schema_1"]
# verify.exclude-schema-list=["test_schema_5"]

# [[verify.override]]
#    mode="length-sum"
#    max-column-count=3
#    columns=[]
#    [[verify.override.table-list]]
#      schema="test_schema_3"
#      name="table_003"
#    verify.override.schema-list=[]

# [[verify.override]]
#    mode="column-count"
#    max-column-count=2
#    columns=[]
#    [[verify.override.table-list]]
#      schema="test_schema_1"
#      name="table_001"
#    verify.override.schema-list=[]

After preparing the TOML configuration file, run the following command to start full migration:

$ mxshift -c config_path_full.toml

2.2 Incremental Migration

[database]
        [database.source]
        ## Name of database
        db-database= "testdb"
        ## Hostname of database master
        db-host="sdw3"
        ## password of database
        db-password="xxxx"
        ## Port of database master
        db-port=54322
        ## user name of database
        db-user="gpadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output' */
        # db-version="PostgreSQL 12 (MatrixDB 5.2.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun  5 2023 15:45:24"
        ## The installation directory of database
        install-dir="/usr/local/greenplum-db-6.7.1"
        [[database.source.hostname-to-ip]]
             ## The content within <> should be replaced with actual information and <> should be removed
             node-hostname="<mdw>" 
             node-ip="<127.0.0.1>"
        [[database.source.hostname-to-ip]]
             node-hostname="<sdw1>"
             node-ip="<127.0.0.2>"
        [[database.source.hostname-to-ip]]
             node-hostname="<sdw2>"
             node-ip="<127.0.0.3>"

        [database.target]
        ## Name of database
        db-database="destdb"
        ## Hostname of database master
        db-host="172.16.100.32"
        ## password of database
        db-password="yyyy"
        ## Port of database master
        db-port=5432
        ## user name of database
        db-user="mxadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output' */
        # db-version="PostgreSQL 12 (MatrixDB 5.2.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun  5 2023 15:45:24"

[scope]
disable-connector=false
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd        
compress-method="lz4"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database        
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Whether to disable data incremental migration, by default, it is true.
disable-data-increment=false
        [[scope.table-list]]
                schema="test_schema_1"
                name="table_001"
        [[scope.table-list]]
                schema="test_schema_2"
                name="table_002"

        [[scope.exclude-table-list]]
                schema="test_schema_3"
                name="table_003"        
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]

[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false

[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=3

[transfer]
## Verity the number of record of every table
verify=true
with-index=true
## Whether to disable transfer Data, by default, it is false.
# disabled-data-transfer=false
## Table with size under this threshold are categorized as "small",
## transfer between masters instead of segments to avoid creating external table.
# small-table-mb=32

[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
# skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
# skip-table-space=true
        [[ddl.replace]]
        ## Only applicable for the case of migration from Greenplum to YMatrix
                category="role"
                [[ddl.replace.pairs]]
                        old="gpadmin"
                        new="mxadmin"
## Whether to disable ddl incremental migration, by default, it is true.
disable-ddl-increment=false
## DDL transfer will stop and exit when executing sql encounter errors, unless the type of DDL is set to 'ignore error'.Types of DDL are:
## "all","collation","default acl","extension","external table","function","index","language",
## "namespace","operator","resource","role","table","tablespace","type","view"
# ignore-error=[]

[verify]
## Whether to enable data verify, by default, it is false.
# enabled=false
## mode for verifying data consistency between source and target database,
## and value restricted to simple-count/column-count/length-sum/hash-bit-bloom.
## 'simple-count' for counting number of rows
## 'column-count' for counting not-null value of a certain column
## 'length-sum' for summing the length of some columns(e.g. column of type text, xml, json, varchar)
## 'hash-bit-bloom' for calculating the hash for one or more columns, and compare it's binary content
##  (except for column of floating-point type, the result of witch can be affected by software/hardware)
# mode="simple-count"
## Columns used in verification, will randomly select columns if not set
# columns=[]
##

--- SPLIT ---

Maximum number of randomly selected columns used in verification  
# max-column-count=3  
# [[verify.table-list]]  
#    schema="test_schema_1"  
#    name="table_001"  
# [[verify.exclude-table-list]]  
#    schema="test_schema_3"  
#    name="table_003"  
# verify.schema-list=["test_schema_1"]  
# verify.exclude-schema-list=["test_schema_5"]  

# [[verify.override]]  
#    mode="length-sum"  
#    max-column-count=3  
#    columns=[]  
#    [[verify.override.table-list]]  
#      schema="test_schema_3"  
#      name="table_003"  
#    verify.override.schema-list=[]  

# [[verify.override]]  
#    mode="column-count"  
#    max-column-count=2  
#    columns=[]  
#    [[verify.override.table-list]]  
#      schema="test_schema_1"  
#      name="table_001"  
#    verify.override.schema-list=[]  

After preparing the TOML file, run the following command to start mxshift for incremental migration.

$ mxshift -c config_path_incremental.toml  

2.3 Conditional Migration

Use the WHERE condition to migrate only data that meets specific criteria or exclude expired data from migration:

[database]  
        [database.source]  
        ## Name of database  
        db-database= "testdb"  
        ## Hostname of database master  
        db-host="sdw3"  
        ## password of database  
        db-password="xxxx"  
        ## Port of database master  
        db-port=54322  
        ## user name of database  
        db-user="gpadmin"  
        ## The installation directory of database  
        install-dir="/usr/local/greenplum-db-6.7.1"  

        [database.target]  
        ## Name of database  
        db-database="destdb"  
        ## Hostname of database master  
        db-host="172.16.100.32"  
        ## password of database  
        db-password="yyyy"  
        ## Port of database master  
        db-port=5432  
        ## user name of database  
        db-user="mxadmin"  

[scope]  
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd        
compress-method="lz4"  
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.  
##dryrun for only executing ddl, not transferring data  
##fetch for fetching data from source and abandon  
##motion for fetching data from source, redistributing and finally abandon  
mode="normal"  
## Sql for select segment information from source database        
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"  
## Sql for select segment information from target database  
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"  
        [[scope.table-list]]  
                schema=......  
                name=......  
        [[scope.table-list]]  
        ......  

        [[scope.exclude-table-list]]  
        ......      
schema-list=["test_schema_1", "test_schema_2"]  
exclude-schema-list=["test_schema_5", "test_schema_8"]  
disable-connector=false  

[log]  
## The log level, value restricted to: debug/verbose/info.  
log-level="info"  
## Print log without color.  
# no-color=false  

[controller]  
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table  
both-way=true  
## The number of table transferred at the same time  
concurrency=3  

[transfer]  
## Verity the number of record of every table  
verify=true  
## Table with size under this threshold are categorized as "small",  
## transfer between masters instead of segments to avoid creating external table.  
# small-table-mb=32  
[transfer.table-data-where-sql]  
enabled=true  
global="txdate >= '2022-10-01' AND batchnum >= 100000000"  
[[transfer.table-data-where-sql.override]]  
        where="abc > 10"  
        [transfer.table-data-where-sql.override.table]  
        schema="test_schema_1"  
        name="table_001"  
[[transfer.table-data-where-sql.override]]  
        where="tag != 'aabbcc' AND ts > '2022-01-01'"  
        [transfer.table-data-where-sql.override.table]  
        schema="test_schema_2"  
        name="another_table"  

[verify]  
## Whether to enable data verify, by default, it is false.  
# enabled=false  
## mode for verifying data consistency between source and target database,  
## and value restricted to simple-count/column-count/length-sum/hash-bit-bloom.  
## 'simple-count' for counting number of rows  
## 'column-count' for counting not-null value of a certain column  
## 'length-sum' for summing the length of some columns(e.g. column of type text, xml, json, varchar)  
## 'hash-bit-bloom' for calculating the hash for one or more columns, and compare it's binary content  
##  (except for column of floating-point type, the result of witch can be affected by software/hardware)  
# mode="simple-count"  
## Columns used in verification, will randomly select columns if not set  
# columns=[]  
## Maximum number of randomly selected columns used in verification  
# max-column-count=3  
# [[verify.table-list]]  
#    schema="test_schema_1"  
#    name="table_001"  
# [[verify.exclude-table-list]]  
#    schema="test_schema_3"  
#    name="table_003"  
# verify.schema-list=["test_schema_1"]  
# verify.exclude-schema-list=["test_schema_5"]  

# [[verify.override]]  
#    mode="length-sum"  
#    max-column-count=3  
#    columns=[]  
#    [[verify.override.table-list]]  
#      schema="test_schema_3"  
#      name="table_003"  
#    verify.override.schema-list=[]  

# [[verify.override]]  
#    mode="column-count"  
#    max-column-count=2  
#    columns=[]  
#    [[verify.override.table-list]]  
#      schema="test_schema_1"  
#      name="table_001"  
#    verify.override.schema-list=[]  

After preparing the TOML file, run the following command to start mxshift for conditional migration.

$ mxshift -c config_path_conditional.toml  

2.4 Migrating Replicated Tables

When migrating an entire database or all replicated tables under a specific schema, no additional configuration is required in config_path.toml. mxshift automatically identifies replicated tables and applies the appropriate migration strategy to prevent data duplication. You only need to configure scope.table-list, specifically setting schema and name.

   [[scope.table-list]]  
                schema=“public1”  
                name=“table_replicated1”  
   [[scope.table-list]]  
                schema=“public2”  
                name=“table_replicated2”  

After preparing the TOML file, run the following command to start mxshift for replicating table migration.

$ mxshift -c config_path_replicated.toml  

Note!
For a detailed explanation of data migration strategies and complete steps, refer to Data Migration.

3 Feature Tips

3.1 Import/Export Table Metadata

For data migration, mxshift first queries the catalog to obtain metadata of the tables to be migrated, then proceeds with the data transfer.

As database usage increases, catalog bloat may lead to longer query times. To speed up debugging and reduce service downtime during migration, mxshift provides command-line options to export table metadata and reuse it in subsequent migrations:

## Export table metadata to mxshift_meta_file and exit  
$ mxshift -c <migration_config_file> --write-table-meta-file mxshift_meta_file  

## Import table metadata from mxshift_meta_file and proceed with data migration  
$ mxshift -c <migration_config_file> --read-table-meta-file mxshift_meta_file  

Note!
Exported table metadata should not be used long-term. Regenerate the metadata whenever the set of tables to migrate changes or when table definitions are modified.

3.2 Import/Export Table DDL

Configure the following in the mxshift configuration file:

[ddl]  
enabled=true  
file-path="/tmp/mxshift.sql"  
mode="output"  
  • file-path: File path for importing or exporting DDL SQL statements. See appendix for details. Do not modify or delete any comments starting with -- DO NOT EDIT..
  • mode: DDL processing mode, with three possible values: exec, output, and input. Their meanings are:
    • exec: Default. Execute DDL directly on the target database.
    • output: Output DDL to the specified file. After export, mxshift exits without performing data migration.
    • input: Read DDL from the specified file and apply it.

If the target database is unreachable under output, configure database.target.db-version as shown below to ensure successful DDL export:

[database]  

    [database.target]  
    ...  
    ## Version of database (Use output of 'SELECT version();'). Required only when  
    ##         1. Source database is unreachable, 'ddl.only-ddl' is enabled, and 'ddl.mode' is 'input'  
    ##         2. Target database is unreachable, and 'ddl.mode' is 'output'  
    db-version="PostgreSQL 12 (MatrixDB 5.2.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul  5 2022 15:45:24"  

If the source database is unreachable under input, configure database.source.db-version and ddl.only-ddl as shown below to ensure successful import:

[database]  

    [database.target]  
    ## Version of database (Use output of 'SELECT version();'). Required only when  
    ##         1. Source database is unreachable, 'ddl.only-ddl' is enabled, and 'ddl.mode' is 'input'  
    ##         2. Target database is unreachable, and 'ddl.mode' is 'output'  
    db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul  5 2022 15:45:24"  

[ddl]  
only-ddl=true  

3.3 Dynamically Adjusting Concurrency

mxshift supports adjusting concurrency during migration. Follow these steps:

  1. Modify the controller.concurrency parameter in the configuration file (the one used to start mxshift):
[controller]  
## The number of tables transferred at the same time  
concurrency=xxx  
  1. Obtain the current process ID of mxshift

    Run ps aux | grep mxshift in the terminal and note the number in the second column, {{mxshift PID}}.

  2. Send the SIGUSR2 signal to the mxshift process to reload the configuration and update concurrency

    Run kill -SIGUSR2 {{mxshift PID}} in the terminal.

3.4 Data Consistency Verification

3.4.1 Decoupling Legacy Verification from Data Migration

In versions prior to v5.2.1, the [transfer] section contained parameters with-index and verify to control whether indexes were migrated and whether row counts were verified. However, index migration and legacy data verification (i.e., row count check only) were tightly coupled with data migration—they could not run independently and had to occur within the data migration process.

Starting from v5.2.1, index migration and legacy verification can run independently. A new parameter disabled-data-transfer has been added under the [transfer] section to skip data transfer. Examples:

  1. Migrate data and indexes:
[transfer]  
## Whether to disable data transfer; default is false  
disabled-data-transfer=false  
## Transfer indexes of tables; default is false  
with-index=true  
  1. Migrate data and verify row counts:
[transfer]  
## Whether to disable data transfer; default is false  
disabled-data-transfer=false  
## Verify row count per table; default is disabled  
verify=true  
  1. Migrate data, indexes, and verify row counts:
[transfer]  
## Whether to disable data transfer; default is false  
disabled-data-transfer=false  
## Transfer indexes of tables; default is false  
with-index=true  
## Verify row count per table; default is disabled  
verify=true  
  1. Migrate data only (no indexes, no verification):
[transfer]  
## Whether to disable data transfer; default is false  
disabled-data-transfer=false  
## Transfer indexes of tables; default is false  
with-index=false  
## Verify row count per table; default is disabled  
verify=false  
  1. Migrate indexes only (no data transfer):
[transfer]  
## Whether to disable data transfer; default is false  
disabled-data-transfer=true  
## Transfer indexes of tables; default is false  
with-index=true  
  1. Verify row counts only (no data transfer):
[transfer]  
## Whether to disable data transfer; default is false  
disabled-data-transfer=true  
## Verify row count per table; default is disabled  
verify=true  

3.4.2 v5.2.1 Data Verification

Starting from version v5.2.1, YMatrix mxshift supports a new data verification algorithm configurable via the [verify] section.

For detailed global configuration parameters, see the Appendix. A full example of the [verify] section is provided in the Full Migration Configuration File above. Below we focus on scope configuration.

[verify]  
# [[verify.table-list]]  
#    schema="test_schema_1"  
#    name="table_001"  
# [[verify.exclude-table-list]]  
#    schema="test_schema_3"  
#    name="table_003"  
# verify.schema-list=["test_schema_1"]  
# verify.exclude-schema-list=["test_schema_5"]  

# [[verify.override]]  
#    mode="length-sum"  
#    max-column-count=3  
#    columns=[]  
#    [[verify.override.table-list]]  
#      schema="test_schema_3"  
#      name="table_003"  
#    verify.override.schema-list=[]  

# [[verify.override]]  
#    mode="column-count"  
#    max-column-count=2  
#    columns=[]  
#    [[verify.override.table-list]]  
#      schema="test_schema_1"  
#      name="table_001"  
#    verify.override.schema-list=[]  

Verification scope is controlled by four sub-sections: [verify.table-list], [verify.exclude-table-list], [verify.schema-list], and [verify.exclude-schema-list].

The priority order is as follows:

  1. If table-list lists specific tables, only those tables are verified.
  2. If no specific tables are listed, check whether schema-list specifies any schemas.
  3. If schemas are specified, verify all tables in those schemas.
  4. If no schema is specified, verify all tables from the most recent transfer.
  • verify.table-list: List of tables included in v5.2.1 data verification.
  • verify.exclude-table-list: List of tables excluded from v5.2.1 data verification.
  • verify.schema-list: List of schemas included in v5.2.1 data verification.
  • verify.exclude-schema-list: List of schemas excluded from v5.2.1 data verification.

The [verify.override] subsection overrides global verification settings and applies its configurations to tables within the specified scope.

Appendix: Parameter Reference

This section describes parameters supported by the mxshift tool.

--- SPLIT ---

Configuration File

To use the mxshift tool, you must first prepare a TOML configuration file. Below is a detailed description of the parameters in the configuration file:

Parameter Description Required
[database.source] category
--db-host Hostname (Host) or IP address of the Master node in the source cluster. The source database cluster refers to the cluster from which data will be migrated out. Yes
--db-port Port number of the Master node in the source database. Yes
--db-database Name of the source database. Yes
--db-user Username for the source database. Yes
--db-password Password for the source database user. Yes
--install-dir Database installation directory. Yes
--hostname-to-ip Maps hostnames to IP addresses. Use this setting when the source and target clusters have identical hostnames to ensure accurate routing. No
[database.target] category
--db-host Hostname (Host) or IP address of the Master node in the target cluster. The target cluster is the destination for data migration. Yes
--db-port Port number of the Master node in the target database. Yes
--db-database Name of the target database. Yes
--db-user Username for the target database. Yes
--db-password Password for the target database user. Yes
[scope] category
--mode Operation mode. Allowed values: normal, dryrun, fetch, motion.
normal: Default value; indicates normal data migration process.
dryrun: Verifies connectivity between source and target databases using simple DDL operations, without reading or writing actual data.
fetch: Based on dryrun, reads data from the source and sends it to the target Segment nodes, but discards it without processing. This option tests source read performance and network transfer speed.
motion: Extends fetch by parsing data at the target and redistributing it to the correct target Segments. This option further tests the target database's data redistribution rate.
No
--compress-method Compression method: gzip / zstd / lz4 / 0. Default is 0 (no compression).
If scope.disable-connector=true, enabling any compression algorithm requires that the mxadmin user on each target Segment can SSH passwordlessly to every source Segment. It is recommended to test with dryrun after configuration.
No
--table-list List of schema and table names from the source. For partitioned tables, specify the parent partitioned table. If migrating the entire database, leave empty. mxshift will migrate all non-zero-sized tables in descending order of size. No
--exclude-table-list List of schema and table names to exclude from migration. No
--schema-list Source database schemas. Priority: if table-list specifies specific tables, only those tables are transferred. Otherwise, if table-list lists schemas, all tables under those schemas are transferred. If neither is set, all tables under schema-list [database.source] are transferred. No
--exclude-schema-list Schemas to exclude from migration. Tables under these schemas will not be migrated. If a schema appears in both db-database and schema-list, it will be excluded. No
--disable-data-increment Whether to disable incremental data migration. Default is exclude-schema-list (disabled). When enabled, mxshift compares the last modification time of tables in the source with the previous run time, and only re-migrates tables modified during that interval. Note: On first enablement, all tables specified in the config file will be migrated. No
--disable-connector Whether to add whitelists for all nodes in the target cluster. Default is true (whitelist only required from source cluster to target Master). Note: If the source is Greenplum, only versions 4.3.17 and above are supported. No
[log] category
--log-level Default is false; prints progress, error, and warning messages. Set to info to output additional debug information, useful for development and troubleshooting. No
--no-color Default is debug. Set to false to disable colored output. No
[controller] category
--both-way Default is true; migrates largest tables first to complete full migration as quickly as possible. If set to false, migration proceeds simultaneously from both largest and smallest tables. No
--concurrency Number of tables that can be migrated in parallel. Default is 3, maximum 10. No
[transfer] category
--verify Verify row count before and after migration. Default is both-way. No
--with-index Whether to migrate indexes. Default is true (not migrated automatically). Note: For MARS2 storage engine tables, index tables must be created separately. However, DDL migration for MARS2 tables includes indexes automatically, regardless of this setting. No
--disabled-data-transfer Skip data migration, only migrate indexes and verify row counts—decoupling migration from verification. Default is false (data migration is not skipped). No
--small-table-mb Tables smaller than this value (in MB) are considered small tables and optimized accordingly (small tables bypass Segment-to-Segment transfer and go directly through the Master). Default is false. To disable this optimization, set to false. No
[ddl] category
--enabled Whether to migrate table DDLs. Default is 32MB. mxshift migrates DDLs by database before starting data migration.
Supported DDL migration versions:
Greenplum 4.3.X/5/6 -> YMatrix 4/5;
YMatrix 4/5 -> YMatrix 4/5.
No
--file-path File path for importing/exporting DDL SQL. File is created if it does not exist. Existing content is cleared before writing. Exported files can be viewed or edited, but do not modify or delete any comment lines starting with " -- DO NOT EDIT.", as this may cause import errors. No
--mode DDL handling mode. Options: -1, false, exec. output is default: execute directly on target database. input: Output to file; exits after export, no data migration. exec: Read from file. In this mode, output must point to an mxshift-exported file, otherwise mxshift cannot recognize it. SPEC: If target database is unreachable in input mode, add --file-path under [database.target]. If source is unreachable in output mode, set --db-version under [ddl] to input and add --only-ddl under [database.source]. No
--only-ddl Whether to migrate only DDL. Default is true. If set to --db-version, data migration starts after DDL migration. If set to false, program exits after DDL migration without migrating table data. No
--skip-resource-queue-and-group Whether to skip resource queues (groups) during DDL migration. When set to false, resource groups and queues are skipped. No
--skip-table-space Whether to skip tablespace migration. When set to true, tablespaces are skipped during DDL migration. No

--disable-ddl-increment
Whether to disable incremental DDL migration. Default is skip-resource-queue-and-group=true (disabled). When enabled, mxshift parses catalogs of source and target databases and migrates only missing DDL objects. This adds comparison steps, so performance is slightly slower than when disabled. No
--ignore-error Whether to ignore specific DDL execution errors and continue. Without this option, any DDL error stops and exits migration. With this option, such errors are logged but do not interrupt execution.
Default is empty (no errors ignored).
skip-table-space=true means all error types.
Multiple types supported, comma-separated, e.g., ["extension","function"].
No
[verify] category
--enabled Enable v5.2.1-style data verification (multiple modes available). Default is true. No
--mode Verification mode. Default is all. mxshift supports four modes: false, simple-count, simple-count, column-count.
length-sum: Same as legacy verification (hash-bit-bloom), checks row count consistency.
simple-count: Compares non-null value counts in a column. If the configured column has NOT NULL constraints or DEFAULT values, another suitable column is randomly selected.
transfer.verify: Computes sum of lengths of one or more columns. Supported column types: column-count — including length-sum, 字符串类型, text, xml, json, jsonb, jsonpath. If the specified column is invalid, another valid column is chosen.
varchar: Computes hash values of one or more columns and compares binary characteristics. Column must support casting to varbit type. Invalid columns trigger selection of other valid ones.
No
--columns Column names used in v5.2.1 data verification. No
--max-column-count Maximum number of columns used in v5.2.1 verification, used when selecting randomly. Default is hash-bit-bloom. No

For conditional migration using WHERE conditions, use the following parameters:

Category Subcategory Subparameter Sub-subcategory Sub-subparameter Description Required
[transfer] category [transfer.table-data-where-sql] category --enabled If set to `true`, applies WHERE condition to filter source data, migrating only rows matching the condition—enabling incremental migration. This feature is disabled by default; enable manually with `true`. No
--global SQL `WHERE` expression applied to all tables in this migration. If a table does not contain the `WHERE` keyword, `SKIPPED` automatically skips the table. No
[[transfer.table-data-where-sql.override]] category --where WHERE sub-expression, higher priority than global. Applies table-level filtering to specific tables. No
[transfer.table-data-where-sql.override.table] category --schema Schema name of the table to migrate. No
--name Table name to migrate. No

Command-line Options

The mxshift command-line options are as follows:

Option Description
--config Print default content of the configuration file.
-c / --config-path Path to the global mxshift configuration file.
-a / --all Execute migration without confirmation. Skips already migrated and empty tables by default.
-v / --version Display mxshift tool version.
-l / --show-progress Show progress of the most recent migration. If no migration is ongoing, shows last result. If migration is in progress, shows current status. STATUS values: Not start — not in migration scope; Skip — skipped (already migrated or empty); Processing/Interrupted — currently migrating or interrupted; Complete — migration completed; Failed — migration failed.
-R / --redo Re-migrate tables even if previously migrated (clears existing data in target table before re-migration).
--write-table-meta-file Export metadata of migration tables to mxshift_meta_file file and exit.
--read-table-meta-file Import migration table metadata from mxshift_meta_file file and proceed with data migration.