mxshift

1 Description

The mxshift tool currently supports the following features:

  • Full migration: Migrate data in parallel from a Greenplum 5 / Greenplum 6 / MatrixDB cluster to a new MatrixDB cluster, at the database level.
  • Reverse migration: Migrate data from MatrixDB back to Greenplum 5 / Greenplum 6.
  • Incremental migration: Perform incremental data migration using WHERE conditions.
  • Replicated table migration: Support migration of replicated tables.

2 Usage

This section describes how to write the configuration file config.json required to start the mxshift tool, including its parameters and command-line options.

2.1 Writing the Configuration File

To use mxshift, you must first prepare a JSON configuration file named config.json. An example is shown below:

{
    "mode":            "normal",
    "verify":          true,
    "bothway":         true,
    "concurrency":     5,
    "compress_method": "lz4",
    "log_level":       "info",
    "gphome":          "/usr/local/greenplum-db-6.7.1",
    "src_host":        "sdw3",
    "src_db":          "testdb",
    "src_user":        "gpadmin",
    "src_password":    "xxxx",
    "src_port":        54322,
    "target_host":     "172.16.100.32",
    "target_db":       "destdb",
    "target_user":     "mxadmin",
    "target_password": "yyyy",
    "target_port":     5432,
    "where": {
        "enabled":  false,
        "global":   "txdate >= '2022-10-01' AND batchnum >= 100000000",
        "override": [
            {
                "schema": "test_schema_1",
                "name":   "table_001",
                "where":  "abc > 10"
            },
            {
                "schema": "test_schema_2",
                "name":   "another_table",
                "where":  "tag != 'aabbcc' AND ts > '2022-01-01'"
            }
        ]
    },
    "schema_list": [
        "test_schema_1",
        "test_schema_2"
    ],
    "exclude_schema_list": [
         "test_schema_5",
         "test_schema_8"
    ],
    "table_list": [
        {
            "schema": "test_schema_1",
            "name":   "table_001"
        },
        {
            "schema": "test_schema_2",
            "name":   "table_002"
        }
    ],
    "exclude_table_list":[
        {
            "schema": "test_schema_3",
            "name":   "table_003"
        }
    ]
}

Below is a detailed description of each parameter in the configuration file:

Parameter Description
mode Operation mode. Allowed values: normal, dryrun, fetch, motion.
normal: Default value; performs normal data migration.
dryrun: Checks connectivity to both databases and validates with simple DDL operations, but does not read or write data.
fetch: Based on dryrun, reads data from the source and sends it to target segments, but discards it without processing. Used to test source read performance and network transfer speed.
motion: Extends fetch by parsing data on the target and redistributing it across proper target segments. Used to test target-side data redistribution rate.
verify Default is false. If enabled, verifies that row counts match before and after migration.
bothway By default, mxshift migrates the largest tables first to complete full migration as quickly as possible. If set to true, mxshift processes both the largest and smallest tables simultaneously from both ends.
concurrency Maximum number of tables migrated in parallel. Default is 5; maximum allowed is 10.
compress_method Compression method: gzip, zstd, lz4, or 0 (no compression). Default is 0. When any compression is enabled, the mxadmin user on each target Segment must be able to SSH passwordlessly to each source Segment. It is recommended to test this setup using dryrun.
deduplicate (Experimental) If a previous mxshift migration was interrupted, enabling this option allows mxshift to resume and complete only the remaining tables, skipping those already successfully migrated. This feature is experimental.
log_level Logging level. Default is info, which outputs progress, error, and warning messages. Set to debug for more detailed logs, useful for development and troubleshooting.
gphome Absolute path to the GPHOME directory on the source database.
src_host Hostname or IP address of the Master node of the source GP cluster.
src_db Source GP database from which data will be migrated.
src_user Username for connecting to the source GP database.
src_password Password for the source GP database user.
src_port Port number of the source GP database.
target_host Hostname or IP address of the Master node of the target MatrixDB cluster.
target_db Target MatrixDB database to which data will be migrated.
target_user Username for connecting to the target MatrixDB database.
target_password Password for the target MatrixDB database user.
target_port Port number of the target MatrixDB database.
table_list List of source GP tables to migrate. For partitioned tables, specify the parent partitioned table. To migrate an entire database, leave this empty and configure schema_list instead. mxshift will automatically migrate all non-empty tables in descending order of size.
exclude_table_list List of schema and table names to exclude from migration.
schema_list List of schemas in the source GP database to include. Priority order: if table_list is specified, only those tables are migrated; otherwise, if schema_list is defined, all tables under the listed schemas are migrated; if neither is defined, all tables in the database are considered.
exclude_schema_list List of schemas to exclude from migration. Tables under these schemas will not be migrated. If a schema appears in both schema_list and exclude_schema_list, it will be excluded.
no-color Disables colored output in logs.

If you want to perform incremental migration using WHERE conditions, use the following additional parameters:

Parameter Description
enabled Disabled by default. Set to true to enable WHERE-based filtering.
global A SQL WHERE expression applied to all tables in the migration. If a table does not contain the columns referenced in the WHERE clause, the filter is skipped for that table, and mxshift will mark it as SKIPPED.
override An array of objects, each containing schema, name, and a custom where condition. These override the global WHERE condition for specific tables.

2.2 Command-Line Options

The mxshift command-line options are as follows:

Option Description
-c or --config_path Path to the mxshift configuration file (e.g., config.json).
-n or --no-color Disable colored log output.
-v or --version Display the version of the mxshift tool.

3 Examples

3.1 Full Migration

Example configuration file config.json for full migration:

{
    "mode": "normal",
    "verify": false,
    "bothway": true,
    "concurrency": 5,
    "compress_method": "",
    "log_level": "info",
    "gphome": "/usr/local/greenplum-db-6.7.1",
    "src_host": "sdw3",
    "src_db": "testdb",
    "src_user": "gpadmin",
    "src_password": "greenplum@2020",
    "src_port": 54322,
    "target_host": "172.16.100.32",
    "target_db": "ms",
    "target_user": "mxadmin",
    "target_password": "xxxx",
    "target_port": 5432,
    "schema_list":["test_schema_1", "test_schema_2"],
    "exclude_schema_list":["test_schema_5", "test_schema_8"],
    "table_list":[
        {
           "schema":"test_schema_1",
           "name":"table_001"
        },
        {
           "schema":"test_schema_2",
           "name":"table_002"
        }
    ],
    "exclude_table_list":[
        {
            "name":"test_schema_3",
            "schema":"table_003"
        }
    ]
}

After preparing the JSON file, run the following command to start the full migration:

mxshift -c config_path.json

3.2 Incremental Migration

Use WHERE conditions to perform incremental migration:

{
    "mode": "normal",
    "verify": false,
    "bothway": true,
    "concurrency": 5,
    "compress_method": "",
    "log_level": "info",
    "gphome": "/usr/local/greenplum-db-6.7.1",
    "src_host": "sdw3",
    "src_db": "testdb",
    "src_user": "gpadmin",
    "src_password": "greenplum@2020",
    "src_port": 54322,
    "target_host": "172.16.100.32",
    "target_db": "ms",
    "target_user": "mxadmin",
    "target_password": "xxxx",
    "target_port": 5432,
    "where": {
        "enabled": true,
        "global": "txdate >= '2022-10-01' AND batchnum >= 100000000",
        "override": [
            {
                "schema":"test_schema_1",
                "name":"table_001",
                "where": "abc > 10"
            }
        ]
    },
    "schema_list":["test_schema_1", "test_schema_2"],
    "exclude_schema_list":["test_schema_5", "test_schema_8"],
    "table_list":[
        ...
    ],
    "exclude_table_list":[
        ...
    ]
}

After preparing the JSON file, run the following command to start the incremental migration:

mxshift -c config_path.json

Note: Ensure "enabled" is set to true to activate the WHERE filter.

3.3 Migrating Replicated Tables

  • When migrating an entire database or all tables within a schema, no extra configuration is needed. mxshift automatically identifies replicated tables and applies appropriate migration logic to prevent data duplication.
  • If a specific table in table_list is a replicated table, you must explicitly declare it by adding the policyType field with value "r".

Example configuration snippet:

"table_list":[
    {
        "schema":"public",
        "name":"tbl_test_replica",
        "policyType":"r"
    }
],

After preparing the JSON file, run the following command to start the replicated table migration:

mxshift -c config_path.json

Note!
For a complete overview of data migration strategies and procedures, see Data Migration.