mxshift

This document describes the data migration tool mxshift.

1 Description

The mxshift tool currently supports the following features:

  • Full migration: Migrates table DDL and data in parallel from a Greenplum 5 / Greenplum 6 / MatrixDB cluster to a new MatrixDB cluster, at the database level. Only full migration mode supports DDL migration.
  • Reverse migration: Migrates data from MatrixDB back to Greenplum 5 / Greenplum 6.
  • Incremental migration: Achieves incremental data migration using WHERE conditions.
  • Supported data distribution strategies: Hash distribution, random distribution, replicated tables (REPLICATED), and Master-only tables.

2 Parameter Information

This section details the parameters of the mxshift tool.

2.1 Configuration File

Using mxshift requires preparing a TOML configuration file. Below is a detailed description of the parameters:

Parameter Description Required
[database.source] category
--db-host Hostname or IP address of the source cluster's Master node. The source cluster refers to the database cluster from which data is migrated out. Yes
--db-port Port number of the source database Master. Yes
--db-database Name of the source database. Yes
--db-user Username for the source database. Yes
--db-password Password for the source database user. Yes
--install-dir Database installation directory. Supported starting from v4.8.3. For v4.8.2 and earlier, use the gphome parameter instead. Yes
--hostname-to-ip Maps hostnames to IP addresses. Use this when source and target clusters share hostnames to ensure accurate routing. Supported starting from v4.8.3. No
[database.target] category
--db-host Hostname or IP address of the target cluster's Master node. The target cluster receives the migrated data. Yes
--db-port Port number of the target database Master. Yes
--db-database Name of the target database. Yes
--db-user Username for the target database. Yes
--db-password Password for the target database user. Yes
[scope] category
--mode Operation mode. Allowed values: normal, dryrun, fetch, motion.
normal: Default value; performs normal data migration.
dryrun: Tests connectivity between databases using simple DDL operations without reading or writing data.
fetch: Based on dryrun, reads data from the source and sends it to target Segments but discards it. Used to test source read performance and network transfer speed.
motion: Extends fetch by parsing data on the target and redistributing it to correct Segments. Tests target-side data redistribution rate.
No
--compress-method Compression method: gzip / zstd / lz4 / 0. Default is 0 (no compression).
When any compression is enabled, the mxadmin user on each target Segment must be able to SSH passwordlessly to each source Segment. Test with dryrun after setup.
No
--gphome Absolute path to GPHOME on the source database. This option was removed starting from v4.8.3. Yes
--table-list List of source table schemas and names. For partitioned tables, specify the parent table. Leave empty to migrate all non-empty tables, sorted from largest to smallest. No
--exclude-table-list List of schema and table names to exclude from migration. No
--schema-list Source database schemas. Priority order:
If table-list specifies tables, only those are migrated.
Otherwise, if table-list lists schemas, all tables under them are migrated.
If neither is set, all tables in schema-list` `[database.source] are migrated.
No
--exclude-schema-list Schemas to exclude from migration. Tables under these schemas will not be migrated. If a schema appears in both db-database and schema-list, it is excluded. No
[log] category
--log-level Default is exclude-schema-list, printing progress, errors, and warnings. Set to info for additional debug output (use only for development and troubleshooting). No
--no-color Default is debug. Set to false to disable colored output. No
[controller] category
--both-way Default is true. Starts migration from the largest table. If set to false, starts from both the largest and smallest tables simultaneously. No
--concurrency Number of tables migrated in parallel. Default is 5, maximum 10. No
[transfer] category
--verify Verifies row count before and after migration. Default is both-way. No
--with-index Whether to migrate indexes. Default is true (disabled). When disabled, SQL statements for index creation are saved in a file under the current execution path for manual execution later.
Note: For tables created with the MARS2 storage engine, indexes must be created to take effect. Therefore, DDL migration for MARS2 tables automatically includes index migration, regardless of this setting.
No
[ddl] category (supported since v4.8.2)
--enabled Whether to migrate table DDL. Default is false. When enabled, mxshift migrates DDL at the database level before migrating data.
Supported DDL migration versions:
Greenplum6 → YMatrix4 / YMatrix5;
YMatrix4 → YMatrix4 / YMatrix5;
YMatrix5 → YMatrix5
No
--file-path File path for importing/exporting DDL SQL. The file is created if it does not exist. Existing content is cleared before writing. You may view or edit the exported file, but do not modify or delete any comment lines starting with "– DO NOT EDIT.", as this may cause import failures. No
--mode DDL processing mode. Options: false, false, exec.
output: Default. Executes DDL directly on the target database.
input: Exports DDL to a file, then exits without migrating data.
exec: Reads DDL from a file. Must point output to an mxshift-generated file; otherwise, parsing fails.
SPEC: In input mode, if the target database is unreachable, add --file-path under [database.target] (see template below). In output mode, if the source is unreachable, set --db-version to input under [ddl] and add --only-ddl under [database.source] (see template).
No
--only-ddl Whether to migrate only DDL. Default is true.
If --db-version, proceeds to migrate table data after DDL migration.
If false, exits after DDL migration without migrating data.
No
--skip-resource-queue-and-group Whether to skip resource queues (groups) during DDL migration. Default is false. No
--skip-table-space Whether to skip tablespaces during DDL migration. Default is true. No

To perform incremental data migration using WHERE conditions, use the following parameters:

Category Sub-category Sub-parameter Sub-sub-category Sub-sub-parameter Description Required
[transfer] [transfer.table-data-where-sql] --enabled If set to ``skip-resource-queue-and-group=true``, filters source data using a WHERE condition to migrate only matching rows. This feature is disabled by default. Enable manually via ``skip-table-space=true``. No
--global SQL ``true`` expression applied globally to all tables. If a table does not contain the ``true`` keyword, ``WHERE`` skips the table automatically. No
[[transfer.table-data-where-sql.override]] --where Table-specific WHERE clause. Takes precedence over global WHERE. Applies filtering at the table level. No
[transfer.table-data-where-sql.override.table] --schema Schema name of the table to migrate. No
--name Name of the table to migrate. No

See example templates below for usage.

2.2 Command-Line Parameters

Command-line options for mxshift:

Parameter Description
--config Prints default configuration file content
-a or --all Proceeds with migration without confirmation. Skips already-migrated and empty tables by default
-c or --config-path Path to the mxshift configuration file
-v or --version Displays mxshift version
-l or --show-progress Shows progress of the most recent migration. If no migration is ongoing, shows last result. Status values:
Not start: Not in migration scope;
Skip: Skipped (already migrated or empty);
Processing/Interrupted: In progress or interrupted;
Complete: Migration completed;
Failed: Migration failed
-R or --redo Re-migrates tables even if previously migrated. Clears existing data in the target table before re-migration

3 Examples

A complete template file is named WHERE. You can generate this template by running SKIPPED. Run config.toml to begin editing, then copy the entire content into your file and adjust configurations accordingly. Alternatively, use one of the scenario-specific templates provided below.

mxshift --config

[database]
    [database.source]
        ## Name of database
        db-database= "testdb"
        ## Hostname of database master
        db-host="sdw3"
        ## password of database
        db-password="xxxx"
        ## Port of database master
        db-port=54322
        ## user name of database
        db-user="mxadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output' */
         # db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul  5 2022 15:45:24"
        ## The installation directory of matrixdb(this option is supported starting from version 4.8.3,4.8.2 and earlier versions need to be specified by gphome)
        install-dir="/usr/local/greenplum-db-6.7.1"
         [[database.source.hostname-to-ip]]
             ## The content within <> should be replaced with actual information and <> should be removed(this option is supported starting from version 4.8.3)
             node-hostname="<mdw>" 
             node-ip="<127.0.0.1>"
        [[database.source.hostname-to-ip]]
             node-hostname="<sdw1>"
             node-ip="<127.0.0.2>"
        [[database.source.hostname-to-ip]]
             node-hostname="<sdw2>"
             node-ip="<127.0.0.3>"

    [database.target]
        ## Name of database
        db-database="destdb"
        ## Hostname of database master
        db-host="172.16.100.32"
        ## password of database
        db-password="yyyy"
        ## Port of database master
        db-port=5432
        ## user name of database
        db-user="mxadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output' */
         # db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul  5 2022 15:45:24"

[scope]
compress_method="lz4"
## (this option is abandoned from version 4.8.3)
# gphome="/usr/local/greenplum-db-6.7.1"
mode="normal"
        [[scope.table-list]]
                schema="test_schema_1"
                name="table_001"
        [[scope.table-list]]
                schema="test_schema_2"
                name="table_002"     
        [[scope.exclude-table-list]]
                schema="test_schema_3"
                name="table_003"       
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]

[log]
log-level="info"
## Print log without color.
# no-color=false

[controller]
both-way=true
concurrency=5

[transfer]
verify=true
# with-index=true
[transfer.table-data-where-sql]
enabled=false
global="txdate >= '2022-10-01' AND batchnum >= 100000000"
[[transfer.table-data-where-sql.override]]
        where="abc > 10"
        [transfer.table-data-where-sql.override.table]
        schema="test_schema_1"
        name="table_001"
[[transfer.table-data-where-sql.override]]
        where="tag != 'aabbcc' AND ts > '2022-01-01'"
        [transfer.table-data-where-sql.override.table]
        schema="test_schema_2"
        name="another_table"

[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
skip-table-space=true
        [[ddl.replace]]
        ## Only applicable for the case of migration from Greenplum to YMatrix
                category="role"
                [[ddl.replace.pairs]]
                        old="gpadmin"
                        new="mxadmin"      

vim config.toml

After preparing the TOML file, run the following command to start the migration:

```bash
$ mxshift -c config_path.toml
```

3.1 Full Migration

```toml
[database]
        [database.source]
        ## Name of database
        db-database= "testdb"
        ## Hostname of database master
        db-host="sdw3"
        ## password of database
        db-password="xxxx"
        ## Port of database master
        db-port=54322
        ## user name of database
        db-user="gpadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output' */
         # db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul  5 2022 15:45:24"
        ## The installation directory of matrixdb(this option is supported starting from version 4.8.3,4.8.2 and earlier versions need to be specified by gphome)
        install-dir="/usr/local/greenplum-db-6.7.1"
        [[database.source.hostname-to-ip]]
             ## The content within <> should be replaced with actual information and <> should be removed(this option is supported starting from version 4.8.3)
            node-hostname="<mdw>" 
             node-ip="<127.0.0.1>"
        [[database.source.hostname-to-ip]]
             node-hostname="<sdw1>"
             node-ip="<127.0.0.2>"
        [[database.source.hostname-to-ip]]
             node-hostname="<sdw2>"
             node-ip="<127.0.0.3>"

        [database.target]
        ## Name of database
        db-database="destdb"
        ## Hostname of database master
        db-host="172.16.100.32"
        ## password of database
        db-password="yyyy"
        ## Port of database master
        db-port=5432
        ## user name of database
        db-user="mxadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output' */
         # db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul  5 2022 15:45:24"

[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd        
compress-method="lz4"
## (this option is abandoned  from version 4.8.3)
# gphome="/usr/local/greenplum-db-6.7.1"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database        
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
        [[scope.table-list]]
                schema="test_schema_1"
                name="table_001"
        [[scope.table-list]]
                schema="test_schema_2"
                name="table_002"   
        [[scope.exclude-table-list]]
                schema="test_schema_3"
                name="table_003"  
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]

[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false

[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=5

[transfer]
## Verity the number of record of every table
verify=true
with-index=true

[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
```

--- SPLIT ---

During DDL transfer, whether to skip transferring resource queues or groups. The default is `true`.  
# skip-resource-queue-and-group=true

## During DDL transfer, whether to skip transferring tablespaces. The default is `true`.  
# skip-table-space=true

        [[ddl.replace]]
        ## Only applicable when migrating from Greenplum to YMatrix
                category="role"
                [[ddl.replace.pairs]]
                        old="gpadmin"
                        new="mxadmin"

After preparing the TOML file, run the following command to start mxshift for full migration.

$ mxshift -c config_path_full.toml  

3.2 Incremental Migration

Use WHERE conditions to perform incremental data migration:


[database]
        [database.source]
        ## Name of the database
        db-database= "testdb"
        ## Hostname of the database master
        db-host="sdw3"
        ## Password of the database
        db-password="xxxx"
        ## Port of the database master
        db-port=54322
        ## Username of the database
        db-user="gpadmin"

        [database.target]
        ## Name of the database
        db-database="destdb"
        ## Hostname of the database master
        db-host="172.16.100.32"
        ## Password of the database
        db-password="yyyy"
        ## Port of the database master
        db-port=5432
        ## Username of the database
        db-user="mxadmin"

[scope]
## Compression method used during data transfer. Valid values: 0/gzip/lz4/zstd
compress-method="lz4"
## Installation directory of MatrixDB
gphome="/usr/local/greenplum-db-6.7.1"
## Data transfer mode between source and target databases. Valid values: normal/dryrun/fetch/motion.
## dryrun: Execute DDL only; no data transfer
## fetch: Fetch data from source and discard
## motion: Fetch data from source, redistribute, then discard
mode="normal"
## SQL to query segment information from the source database
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## SQL to query segment information from the target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
        [[scope.table-list]]
                schema=......
                name=......
        [[scope.table-list]]
        ......   
        [[scope.exclude-table-list]]
        ......      
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]

[log]
## Log level. Valid values: debug/verbose/info
log-level="info"
## Disable colored output in logs
# no-color=false

[controller]
## By default, transfer starts with the largest table. Set to 'bothway' to start from both the largest and smallest tables
both-way=true
## Number of tables to transfer concurrently
concurrency=5

[transfer]
## Verify row count for each table after transfer
verify=true
[transfer.table-data-where-sql]
enabled=true
global="txdate >= '2022-10-01' AND batchnum >= 100000000"
[[transfer.table-data-where-sql.override]]
        where="abc > 10"
        [transfer.table-data-where-sql.override.table]
        schema="test_schema_1"
        name="table_001"
[[transfer.table-data-where-sql.override]]
        where="tag != 'aabbcc' AND ts > '2022-01-01'"
        [transfer.table-data-where-sql.override.table]
        schema="test_schema_2"
        name="another_table"

After preparing the TOML file, run the following command to start mxshift for incremental migration.

$ mxshift -c config_path_incremental.toml  

3.3 Migrating Replicated Tables

When migrating all replicated tables within a database or a specific schema, no additional configuration is required in config_path.toml. mxshift automatically identifies replicated tables and applies the appropriate migration strategy to prevent data duplication. You only need to define the desired tables in scope.table-list using schema and name.

   [[scope.table-list]]
                schema="public1"
                name="table_replicated1"
   [[scope.table-list]]
                schema="public2"
                name="table_replicated2"

After preparing the TOML file, run the following command to start mxshift for replicated table migration.

$ mxshift -c config_path_replicated.toml  

Note!
For an overview of data migration strategies and complete procedures, see Data Migration