mxshift

This document describes the data migration tool mxshift.

1 Description

The mxshift tool currently supports the following features:

  • Full migration: Migrates table DDL and data in parallel from a Greenplum 5 / Greenplum 6 / YMatrix cluster to a new YMatrix cluster, at the database level. Only full migration mode supports DDL migration.
  • Reverse migration: Migrates data from YMatrix back to Greenplum 5 / Greenplum 6.
  • Incremental migration: Performs incremental data migration using WHERE conditions.
  • Supported data distribution strategies: Hash distribution, random distribution, replicated tables (REPLICATED), and Master-only tables.

2 Parameter Information

This section describes the parameters of the mxshift tool.

2.1 Configuration File

Using mxshift requires preparing a TOML configuration file. Below are detailed parameter descriptions:

Parameter Description Required
[database.source] category
--db-host Hostname or IP address of the source cluster's Master node. The source cluster is the database cluster from which data is migrated. Yes
--db-port Port number of the source database Master. Yes
--db-database Name of the source database. Yes
--db-user Username for the source database. Yes
--db-password Password for the source database user. Yes
--install-dir Database installation directory (available in v5.0.1 and later; v5.0.0 requires gphome configuration). Yes
--hostname-to-ip Maps hostnames to IP addresses. Use this when source and target clusters share hostnames to ensure accurate routing (supported starting from v5.0.1). No
[database.target] category
--db-host Hostname or IP address of the target cluster's Master node. The target cluster is where data is migrated into. Yes
--db-port Port number of the target database Master. Yes
--db-database Name of the target database. Yes
--db-user Username for the target database. Yes
--db-password Password for the target database user. Yes
[scope] category
--mode Operation mode. Allowed values: normal, dryrun, fetch, motion.
normal: Default value, indicates normal data migration;
dryrun: Verifies connectivity between source and target databases using simple DDL operations without reading or writing actual data;
fetch: Based on dryrun, reads data from the source and sends it to target Segments but discards it—used to test source read performance and network throughput;
motion: Extends fetch by parsing data on the target and redistributing it to correct Segments—used to test target-side data redistribution speed.
No
--compress-method Compression method: gzip / zstd / lz4 / 0. Default is 0 (no compression).
When any compression is enabled, the mxadmin user on each target Segment must be able to SSH passwordlessly to each source Segment. It is recommended to test with dryrun after setup.
No
--gphome Absolute path to GPHOME on the source database (available in v5.0.0; use install-dir in v5.0.1+). Yes
--table-list List of source table schemas and names. For partitioned tables, specify the parent partitioned table. To migrate the entire database, leave empty—mxshift will migrate all non-zero-sized tables in descending order of size. No
--exclude-table-list List of schema and table names to exclude from migration. No
--schema-list Source database schemas. Priority: if table-list specifies tables, only those are migrated; otherwise, if table-list lists schemas, all tables under them are migrated; otherwise, all tables in schema-list` `[database.source] are migrated. No
--exclude-schema-list Schemas to exclude from migration. Tables under these schemas will not be migrated. If a schema appears in both db-database and schema-list, it will be excluded. No
[log] category
--log-level Default is exclude-schema-list, printing progress, error, and warning messages. Set to info for additional debug output (useful for development and troubleshooting). No
--no-color Default is debug. Set to false to disable colored log output. No
[controller] category
--both-way Default is true. Migration starts with the largest table to complete full migration as quickly as possible. If set to false, migration proceeds simultaneously from both the largest and smallest tables. No
--concurrency Number of tables migrated in parallel. Default is 5, maximum is 10. No
[transfer] category
--verify Verifies row count consistency before and after migration. Default is both-way. No
--with-index Whether to migrate indexes. Default is true (disabled). When disabled, mxshift saves SQL statements for creating indexes in a file under the current execution directory for manual creation later. Note: For tables created with the MARS2 storage engine, indexes must be created for the table to function properly. Therefore, DDL migration for MARS2 tables automatically includes index migration regardless of this setting. No
[ddl] category
--enabled Whether to migrate table DDL. Default is false. When enabled, mxshift migrates DDL at the database level before migrating data.
Supported DDL migration versions:
Greenplum 6 → YMatrix 4 / YMatrix 5;
YMatrix 4 → YMatrix 4 / YMatrix 5;
YMatrix 5 → YMatrix 5.
No
--only-ddl Whether to migrate only DDL. Default is false. If set to false, data migration follows DDL migration. If set to false, the tool exits after DDL migration without migrating table data. No
--skip-resource-queue-and-group Whether to skip resource queues (groups) during DDL migration. When false, resource groups and queues are skipped. No
--skip-table-space Whether to skip tablespaces during DDL migration. When true, tablespaces are skipped. No

To perform incremental data migration using WHERE conditions, use the following parameters:

Category Sub-category Sub-parameter Sub-sub-category Sub-sub-parameter Description Required
[transfer] category [transfer.table-data-where-sql] category --enabled If set to ``skip-resource-queue-and-group=true``, filters source data using a WHERE condition to migrate only matching rows, enabling incremental migration. This feature is off by default and must be manually enabled via ``skip-table-space=true``. No
--global Specifies a global SQL ``true`` expression applied to all tables in the migration. If a table does not contain the ``true`` keyword, ``WHERE`` skips that table automatically. No
[[transfer.table-data-where-sql.override]] category --where Table-specific WHERE sub-expression. Takes precedence over the global WHERE condition. Applies filtering at the individual table level. No
[transfer.table-data-where-sql.override.table] category --schema Schema name of the table to migrate. No
--name Name of the table to migrate. No

See examples below for usage.

2.2 Command-Line Parameters

Command-line options for mxshift:

Parameter Description
--config Prints default content of the configuration file
-a or --all Proceeds with migration without confirmation. Skips already completed and empty tables by default
-c or --config-path Path to the global configuration file for mxshift
-v or --version Displays version of the mxshift tool
-l or --show-progress Shows progress of the most recent migration. If no migration is ongoing, displays last result. If active, shows current progress. STATUS values:
Not start: Not in migration scope;
Skip: Skipped (already migrated or empty);
Processing/Interrupted: In progress or interrupted;
Complete: Migration completed;
Failed: Migration failed
-R or --redo Re-migrates tables even if previously migrated (clears existing data in target table before re-migration)

3 Examples

A complete template file is named WHERE. You can generate this template by running SKIPPED. Run config.toml to begin editing, then copy the full content into your file and modify relevant settings. Alternatively, use one of the example templates below based on your use case.

`mxshift --config`` [database] [database.source]

Name of database

    db-database= "testdb"
    ## Hostname of database master
    db-host="sdw3"
    ## password of database
    db-password="xxxx"
    ## Port of database master
    db-port=54322
    ## user name of database
    db-user="mxadmin"
    ## The installation directory of database(v5.0.1 and above)
    # install-dir="/usr/local/greenplum-db-6.7.1"

[database.target]
    ## Name of database
    db-database="destdb"
    ## Hostname of database master
    db-host="172.16.100.32"
    ## password of database
    db-password="yyyy"
    ## Port of database master
    db-port=5432
    ## user name of database
    db-user="mxadmin"

[scope] compress_method="lz4" gphome mode="normal" [[scope.table-list]] schema="test_schema_1" name="table_001" [[scope.table-list]] schema="test_schema_2" name="table_002"
[[scope.exclude-table-list]] schema="test_schema_3" name="table_003" schema-list=["test_schema_1", "test_schema_2"] exclude-schema-list=["test_schema_5", "test_schema_8"]

[log] log-level="info"

Print log without color.

no-color=false

[controller] both-way=true concurrency=5

[transfer] verify=true

with-index=true

[transfer.table-data-where-sql] enabled=false global="txdate >= '2022-10-01' AND batchnum >= 100000000" [[transfer.table-data-where-sql.override]] where="abc > 10" [transfer.table-data-where-sql.override.table] schema="test_schema_1" name="table_001" [[transfer.table-data-where-sql.override]] where="tag != 'aabbcc' AND ts > '2022-01-01'" [transfer.table-data-where-sql.override.table] schema="test_schema_2" name="another_table"

[ddl] enabled=true only-ddl=false

During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.

skip-resource-queue-and-group=true

During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.

skip-table-space=true [[ddl.replace]]

Only applicable for the case of migration from Greenplum to YMatrix

            category="role"
            [[ddl.replace.pairs]]
                    old="gpadmin"
                    new="mxadmin"      

`vim config.toml``

After preparing the TOML file, run the following command to start migration:

$ mxshift -c config_path.toml

3.1 Full Migration

[database]
        [database.source]
        ## Name of database
        db-database= "testdb"
        ## Hostname of database master
        db-host="sdw3"
        ## password of database
        db-password="xxxx"
        ## Port of database master
        db-port=54322
        ## user name of database
        db-user="gpadmin"
        ## The installation directory of database(v5.0.1 and above)
        # install-dir="/usr/local/greenplum-db-6.7.1"

        [database.target]
        ## Name of database
        db-database="destdb"
        ## Hostname of database master
        db-host="172.16.100.32"
        ## password of database
        db-password="yyyy"
        ## Port of database master
        db-port=5432
        ## user name of database
        db-user="mxadmin"

[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd        
compress-method="lz4"
## The installation directory of database(v5.0.0)
# gphome="/usr/local/greenplum-db-6.7.1"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database        
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
        [[scope.table-list]]
                schema="test_schema_1"
                name="table_001"
        [[scope.table-list]]
                schema="test_schema_2"
                name="table_002"
        [[scope.exclude-table-list]]
                schema="test_schema_3"
                name="table_003"       
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]

[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false

[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=5

[transfer]
## Verity the number of record of every table
verify=true
with-index=true

[ddl]
enabled=true
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
# skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
# skip-table-space=true
        [[ddl.replace]]
        ## Only applicable for the case of migration from Greenplum to YMatrix
                category="role"
                [[ddl.replace.pairs]]
                        old="gpadmin"
                        new="mxadmin"

After preparing the TOML file, run the following command to start full migration:

$ mxshift -c config_path_full.toml

3.2 Incremental Migration

Perform incremental data migration using WHERE conditions:

[database]
        [database.source]
        ## Name of database
        db-database= "testdb"
        ## Hostname of database master
        db-host="sdw3"
        ## password of database
        db-password="xxxx"
        ## Port of database master
        db-port=54322
        ## user name of database
        db-user="gpadmin"
        ## The installation directory of database(v5.0.1 and above)
        # install-dir="/usr/local/greenplum-db-6.7.1"

        [database.target]
        ## Name of database
        db-database="destdb"
        ## Hostname of database master
        db-host="172.16.100.32"
        ## password of database
        db-password="yyyy"
        ## Port of database master
        db-port=5432
        ## user name of database
        db-user="mxadmin"

[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd        
compress-method="lz4"
## The installation directory of database(v5.0.0)
# gphome="/usr/local/greenplum-db-6.7.1"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database        
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
        [[scope.table-list]]
                schema=......
                name=......
        [[scope.table-list]]
        ......     
        [[scope.exclude-table-list]]
        ......        
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]

[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false

[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=5

[transfer]
## Verity the number of record of every table
verify=true
[transfer.table-data-where-sql]
enabled=true
global="txdate >= '2022-10-01' AND batchnum >= 100000000"
[[transfer.table-data-where-sql.override]]
        where="abc > 10"
        [transfer.table-data-where-sql.override.table]
        schema="test_schema_1"
        name="table_001"
[[transfer.table-data-where-sql.override]]
        where="tag != 'aabbcc' AND ts > '2022-01-01'"
        [transfer.table-data-where-sql.override.table]
        schema="test_schema_2"
        name="another_table"

After preparing the TOML file, run the following command to start incremental migration:

$ mxshift -c config_path_incremental.toml

--- SPLIT ---

3.3 Migrate Replicated Tables

When migrating an entire database or all replicated tables under a specific schema, you do not need to add extra configurations in config_path.toml. mxshift automatically identifies which tables are replicated and applies the appropriate migration strategy to prevent data duplication. You only need to configure schema and name in scope.table-list.

   [[scope.table-list]]
                schema="public1"
                name="table_replicated1"
   [[scope.table-list]]
                schema="public2"
                name="table_replicated2"

After preparing the TOML file, run the following command to start mxshift for replicating table migration:

$ mxshift -c config_path_replicated.toml

Note!
For a detailed explanation of the data migration approach and complete steps, see Data Migration.