mxshift

This document describes the data migration tool mxshift.

1 Description

The mxshift tool currently supports the following features:

  • Full migration: Migrates table DDL and data in parallel from a Greenplum 4.3.X/5/6 or YMatrix cluster to a new YMatrix cluster, at the database level.
  • Incremental migration: Migrates DDL and data for selected tables in parallel from a Greenplum 5/6 or YMatrix cluster to a new YMatrix cluster, at the table level.
  • Conditional migration: Enables conditional data migration using the WHERE clause.
  • Supported distribution strategies: Hash distribution, random distribution, replicated tables (REPLICATED), and master-only tables.

2 Parameter Information

This section details the parameters of the mxshift tool.

2.1 Configuration File

Using mxshift requires preparing a TOML configuration file. Below is a detailed description of the parameters:

Parameter Description Required
[database.source] category
--db-host Hostname (Host) or IP address of the source cluster's Master node. The source cluster refers to the database cluster from which data is migrated out. Yes
--db-port Port number of the source database Master. Yes
--db-database Name of the source database. Yes
--db-user Username for the source database. Yes
--db-password Password for the source database user. Yes
--install-dir Database installation directory. Yes
--hostname-to-ip Maps hostnames to IP addresses. Use this when source and target clusters share hostnames to ensure accurate routing. No
[database.target] category
--db-host Hostname (Host) or IP address of the target cluster's Master node. The target cluster is where data is migrated into. Yes
--db-port Port number of the target database Master. Yes
--db-database Name of the target database. Yes
--db-user Username for the target database. Yes
--db-password Password for the target database user. Yes
[scope] category
--mode Operation mode. Allowed values: normal, dryrun, fetch, motion.
normal: Default value; performs standard data migration.
dryrun: Verifies connectivity between databases using simple DDL operations without reading or writing data.
fetch: On top of dryrun, reads data from the source and sends it to target segments but discards it. Used to test source read performance and network transfer speed.
motion: Builds on fetch by parsing data on the target and redistributing it to correct target Segments. Tests target-side data redistribution rate.
No
--compress-method Compression method: 0 / gzip / lz4 / zstd. Default is 0 (no compression).
When any compression is enabled, the mxadmin user on each target Segment must be able to SSH passwordlessly to each source Segment. It is recommended to test with ssh after configuration.
No
--table-list List of schema and table names from the source. For partitioned tables, specify the parent partitioned table. If empty, mxshift migrates all non-empty tables in descending order of size. No
--exclude-table-list List of schema and table names to exclude from migration. No
--schema-list Source database schemas. Priority: if table-list is specified, only those tables are migrated. Otherwise, if schema-list is set, all tables under these schemas are migrated. If neither is set, all tables in the database are migrated. No
--exclude-schema-list Schemas to exclude from migration. Tables under these schemas will not be migrated. If a schema appears in both schema-list and exclude-schema-list, it will be excluded. No
--disable-data-increment Enables incremental data migration. Default is true (disabled). When enabled, mxshift compares the last modification time of tables in the source with the previous run time, and re-migrates only modified tables. Note: On first enable, all tables specified in the config are migrated. No
[log] category
--log-level Default is info, printing progress, error, and warning messages. Set to debug for additional debug output, useful for development and troubleshooting. No
--no-color Default is false. Set to true to disable colored output. No
[controller] category
--both-way Default is false. Migration starts from the largest table to minimize total migration time. If set to true, migration proceeds simultaneously from both the largest and smallest tables. No
--concurrency Number of tables migrated in parallel. Default is 3, maximum 10. No
[transfer] category
--verify Verifies that row counts match before and after migration. Default is true. No
--with-index Whether to migrate indexes. Default is false (not auto-migrated). When disabled, SQL statements for creating indexes are saved in a file under the current execution path for manual creation later. Note: For tables created with the MARS2 storage engine, indexes must be created for the table to function properly. Therefore, DDL migration for MARS2 tables automatically includes index migration regardless of this setting. No
[ddl] category
--enabled Whether to migrate table DDL. Default is true. Before migrating data, mxshift first migrates DDL at the database level.
Supported DDL migration versions:
Greenplum 4.3.X/5/6 → YMatrix 4 / YMatrix 5;
YMatrix 4/5 → YMatrix 4/5
No
--file-path File path for importing/exporting DDL SQL. If the file does not exist, it is created. Existing content is cleared before writing. Exported files can be reviewed or edited, but do not modify or delete any lines starting with "-- DO NOT EDIT.", as this may cause import errors. No
--mode DDL processing mode. Options: execute, output, input.
execute: Default; executes DDL directly on the target database.
output: Exports DDL to a file and exits without migrating data.
input: Reads DDL from a file. Must point file-path to an mxshift-generated export file; otherwise, parsing fails. SPEC: If the target database is unreachable in input mode, add db-version under [database.target] (see template below). If the source is unreachable in output mode, set db-version under [database.source] and set ddl.mode = "output" (see template below).
No
--only-ddl Whether to migrate only DDL. Default is false. If true, exits after DDL migration without migrating table data. If false, proceeds to migrate table data after DDL migration. No
--skip-resource-queue-and-group Skips resource queues and groups during DDL migration. Default is true. No
--skip-table-space Skips tablespaces during DDL migration. Default is true. No

--disable-ddl-increment
Enables incremental DDL migration. Default is true (disabled). When enabled, mxshift parses catalogs of source and target databases and migrates only missing objects. This adds comparison overhead, making it slightly slower than full DDL migration. No

To perform conditional migration using a WHERE clause, use the following parameters:

Category Sub-category Sub-parameter Sub-sub-category Sub-sub-parameter Description Required
[transfer] category [transfer.table-data-where-sql] category --enabled If set to `true`, filters source data using a WHERE condition, migrating only matching rows. This feature is off by default and must be manually enabled with `--enabled=true`. No
--global Global WHERE expression applied to all tables. If a table does not contain the referenced column, `mxshift` skips the table with a warning. No
[[transfer.table-data-where-sql.override]] category --where Table-specific WHERE clause, takes precedence over global. Applies filtering at the table level. No
[transfer.table-data-where-sql.override.table] category --schema Schema name of the table to migrate No
--name Table name to migrate No

See configuration templates below for usage examples.

2.2 Command-Line Parameters

Command-line options for mxshift:

Parameter Description
--config Prints default configuration file content
-a or --all Proceeds with migration without confirmation. Skips already completed and empty tables by default
-c or --config-path Path to the global configuration file for mxshift
-v or --version Displays mxshift version
-l or --show-progress Shows progress of the most recent migration. If no migration is ongoing, shows last result. If active, shows current progress. STATUS meanings: Not start — not in scope; Skip — skipped (already migrated or empty); Processing/Interrupted — in progress or interrupted; Complete — successfully migrated; Failed — migration failed
-R or --redo Re-migrates tables even if previously migrated (clears existing data in target table before re-migration)

3 Examples

A complete template file is named config_template.toml. You can generate this template by running mxshift --config.

Below are example templates for different use cases. Use them as references and customize using a text editor.

3.1 Full Migration

[database]
        [database.source]
        ## Name of database
        db-database= "testdb"
        ## Hostname of database master
        db-host="sdw3"
        ## password of database
        db-password="xxxx"
        ## Port of database master
        db-port=54322
        ## user name of database
        db-user="gpadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output' */
         # db-version="PostgreSQL 12 (MatrixDB 5.1.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun  5 2023 15:45:24"
         ## The installation directory of database
         install-dir="/usr/local/greenplum-db-6.7.1"
         [[database.source.hostname-to-ip]]
             ## The content within <> should be replaced with actual information and <> should be removed
             node-hostname="<mdw>" 
             node-ip="<127.0.0.1>"
         [[database.source.hostname-to-ip]]
             node-hostname="<sdw1>"
             node-ip="<127.0.0.2>"
         [[database.source.hostname-to-ip]]
             node-hostname="<sdw2>"
             node-ip="<127.0.0.3>"

        [database.target]
        ## Name of database
        db-database="destdb"
        ## Hostname of database master
        db-host="172.16.100.32"
        ## password of database
        db-password="yyyy"
        ## Port of database master
        db-port=5432
        ## user name of database
        db-user="mxadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output' */
         # db-version="PostgreSQL 12 (MatrixDB 5.1.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun  5 2023 15:45:24"

[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd        
compress-method="lz4"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database        
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
        [[scope.table-list]]
                schema="test_schema_1"
                name="table_001"
        [[scope.table-list]]
                schema="test_schema_2"
                name="table_002"       
        [[scope.exclude-table-list]]
                schema="test_schema_3"
                name="table_003"
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]
## Whether to disable data incremental migration, by default, it is true.
# disable-data-increment=true

[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false

[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=3

[transfer]
## Verity the number of record of every table
verify=true
with-index=true

[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
# skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
# skip-table-space=true
        [[ddl.replace]]
        ## Only applicable for the case of migration from Greenplum to YMatrix
                category="role"
                [[ddl.replace.pairs]]
                        old="gpadmin"
                        new="mxadmin"
## Whether to disable ddl incremental migration, by default, it is true.
# disable-ddl-increment=true

After preparing the TOML file, run the following command to start full migration:

$ mxshift -c config_path_full.toml

3.2 Incremental Migration

[database]
        [database.source]
        ## Name of database
        db-database= "testdb"
        ## Hostname of database master
        db-host="sdw3"
        ## password of database
        db-password="xxxx"
        ## Port of database master
        db-port=54322
        ## user name of database
        db-user="gpadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output' */
         # db-version="PostgreSQL 12 (MatrixDB 5.1.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun  5 2023 15:45:24"
         ## The installation directory of database
         install-dir="/usr/local/greenplum-db-6.7.1"
         [[database.source.hostname-to-ip]]
             ## The content within <> should be replaced with actual information and <> should be removed
             node-hostname="<mdw>" 
             node-ip="<127.0.0.1>"
         [[database.source.hostname-to-ip]]
             node-hostname="<sdw1>"
             node-ip="<127.0.0.2>"
         [[database.source.hostname-to-ip]]
             node-hostname="<sdw2>"
             node-ip="<127.0.0.3>"

        [database.target]
        ## Name of database
        db-database="destdb"
        ## Hostname of database master
        db-host="172.16.100.32"
        ## password of database
        db-password="yyyy"
        ## Port of database master
        db-port=5432
        ## user name of database
        db-user="mxadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output' */
         # db-version="PostgreSQL 12 (MatrixDB 5.1.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun  5 2023 15:45:24"

[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd        
compress-method="lz4"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database        
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
        [[scope.table-list]]
                schema="test_schema_1"
                name="table_001"
        [[scope.table-list]]
                schema="test_schema_2"
                name="table_002"          
        [[scope.exclude-table-list]]
                schema="test_schema_3"
                name="table_003"
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]
##

--- SPLIT ---

Whether to disable data incremental migration. By default, it is set to `true`.  
disable-data-increment=false

[log]  
## The log level. Valid values are: debug, verbose, info.  
log-level="info"  
## Print log without color.  
# no-color=false  

[controller]  
## By default, transfer work starts from the largest table. If set to 'bothway', it starts from both the largest and smallest tables.  
both-way=true  
## The number of tables to transfer concurrently.  
concurrency=3  

[transfer]  
## Verify the number of records in each table.  
verify=true  
with-index=true  

[ddl]  
enabled=true  
# file-path="/tmp/mxshift.sql"  
# mode="output"  
only-ddl=false  
## During DDL transfer, whether to skip resource queue or role group migration. Default is true.  
# skip-resource-queue-and-group=true  
## During DDL transfer, whether to skip tablespace migration. Default is true.  
# skip-table-space=true  
        [[ddl.replace]]  
        ## Applicable only when migrating from Greenplum to YMatrix  
                category="role"  
                [[ddl.replace.pairs]]  
                        old="gpadmin"  
                        new="mxadmin"  
## Whether to disable DDL incremental migration. By default, it is true.  
disable-ddl-increment=false  

After preparing the TOML file, run the following command to start mxshift for incremental migration.

$ mxshift -c config_path_incremental.toml  

3.3 Conditional Migration

Use WHERE to perform data migration based on specific conditions, such as excluding expired data:

[database]  
        [database.source]  
        ## Name of the source database  
        db-database= "testdb"  
        ## Hostname of the source database master  
        db-host="sdw3"  
        ## Password for the source database  
        db-password="xxxx"  
        ## Port of the source database master  
        db-port=54322  
        ## Username for the source database  
        db-user="gpadmin"  
        ## Installation directory of the source database  
        install-dir="/usr/local/greenplum-db-6.7.1"  

        [database.target]  
        ## Name of the target database  
        db-database="destdb"  
        ## Hostname of the target database master  
        db-host="172.16.100.32"  
        ## Password for the target database  
        db-password="yyyy"  
        ## Port of the target database master  
        db-port=5432  
        ## Username for the target database  
        db-user="mxadmin"  

[scope]  
## Compression method used during data transfer. Valid methods: 0 (no compression), gzip, lz4, zstd  
compress-method="lz4"  
## Data transfer mode. Valid values: normal, dryrun, fetch, motion  
## dryrun: Execute DDL only; do not transfer data  
## fetch: Extract data from source and discard  
## motion: Extract data from source, redistribute across segments, then discard  
mode="normal"  
## SQL to query segment information from the source database  
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"  
## SQL to query segment information from the target database  
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"  
        [[scope.table-list]]  
                schema=......  
                name=......  
        [[scope.table-list]]  
        ......    
        [[scope.exclude-table-list]]  
        ......    
schema-list=["test_schema_1", "test_schema_2"]  
exclude-schema-list=["test_schema_5", "test_schema_8"]  

[log]  
## The log level. Valid values are: debug, verbose, info.  
log-level="info"  
## Print log without color.  
# no-color=false  

[controller]  
## By default, transfer starts from the largest table. Set to 'bothway' to start from both largest and smallest.  
both-way=true  
## Number of tables to transfer in parallel.  
concurrency=3  

[transfer]  
## Verify row count for every table after transfer.  
verify=true  
[transfer.table-data-where-sql]  
enabled=true  
global="txdate >= '2022-10-01' AND batchnum >= 100000000"  
[[transfer.table-data-where-sql.override]]  
        where="abc > 10"  
        [transfer.table-data-where-sql.override.table]  
        schema="test_schema_1"  
        name="table_001"  
[[transfer.table-data-where-sql.override]]  
        where="tag != 'aabbcc' AND ts > '2022-01-01'"  
        [transfer.table-data-where-sql.override.table]  
        schema="test_schema_2"  
        name="another_table"  

After preparing the TOML file, run the following command to start mxshift for conditional migration.

$ mxshift -c config_path_conditional.toml  

3.4 Migrating Replicated Tables

When migrating an entire database or all replicated tables under a specific schema, no additional configuration is required in config_path.toml. mxshift automatically identifies replicated tables and applies the appropriate migration strategy to prevent data duplication. You only need to configure scope.table-list, specifically defining schema and name.

   [[scope.table-list]]  
                schema="public1"  
                name="table_replicated1"  
   [[scope.table-list]]  
                schema="public2"  
                name="table_replicated2"  

After preparing the TOML file, run the following command to start mxshift for replicated table migration.

$ mxshift -c config_path_replicated.toml  

Note!
For a complete overview of data migration strategies and procedures, refer to Data Migration