mxshift

1 Description

The mxshift tool currently supports the following features:

Full migration: Migrate data in parallel from a Greenplum 5 / Greenplum 6 / MatrixDB cluster to a new MatrixDB cluster, at the database level.
Reverse migration: Migrate data from MatrixDB back to Greenplum 5 / Greenplum 6.
Incremental migration: Perform incremental data migration using WHERE conditions.
Replicated table migration: Support migration of replicated tables.

2 Usage

This section describes how to write the configuration file config.json required to start the mxshift tool, including its parameters and command-line options.

2.1 Writing the Configuration File

To use mxshift, you must first prepare a JSON configuration file named config.json. An example is shown below:

{
    "mode":            "normal",
    "verify":          true,
    "bothway":         true,
    "concurrency":     5,
    "compress_method": "lz4",
    "log_level":       "info",
    "gphome":          "/usr/local/greenplum-db-6.7.1",
    "src_host":        "sdw3",
    "src_db":          "testdb",
    "src_user":        "gpadmin",
    "src_password":    "xxxx",
    "src_port":        54322,
    "target_host":     "172.16.100.32",
    "target_db":       "destdb",
    "target_user":     "mxadmin",
    "target_password": "yyyy",
    "target_port":     5432,
    "where": {
        "enabled":  false,
        "global":   "txdate >= '2022-10-01' AND batchnum >= 100000000",
        "override": [
            {
                "schema": "test_schema_1",
                "name":   "table_001",
                "where":  "abc > 10"
            },
            {
                "schema": "test_schema_2",
                "name":   "another_table",
                "where":  "tag != 'aabbcc' AND ts > '2022-01-01'"
            }
        ]
    },
    "schema_list": [
        "test_schema_1",
        "test_schema_2"
    ],
    "exclude_schema_list": [
         "test_schema_5",
         "test_schema_8"
    ],
    "table_list": [
        {
            "schema": "test_schema_1",
            "name":   "table_001"
        },
        {
            "schema": "test_schema_2",
            "name":   "table_002"
        }
    ],
    "exclude_table_list":[
        {
            "schema": "test_schema_3",
            "name":   "table_003"
        }
    ]
}

Below is a detailed description of each parameter in the configuration file:

Parameter	Description
`mode`	Operation mode. Allowed values: `normal`, `dryrun`, `fetch`, `motion`. `normal`: Default value; performs normal data migration. `dryrun`: Checks connectivity to both databases and validates with simple DDL operations, but does not read or write data. `fetch`: Based on `dryrun`, reads data from the source and sends it to target segments, but discards it without processing. Used to test source read performance and network transfer speed. `motion`: Extends `fetch` by parsing data on the target and redistributing it across proper target segments. Used to test target-side data redistribution rate.
`verify`	Default is `false`. If enabled, verifies that row counts match before and after migration.
`bothway`	By default, `mxshift` migrates the largest tables first to complete full migration as quickly as possible. If set to `true`, `mxshift` processes both the largest and smallest tables simultaneously from both ends.
`concurrency`	Maximum number of tables migrated in parallel. Default is 5; maximum allowed is 10.
`compress_method`	Compression method: `gzip`, `zstd`, `lz4`, or `0` (no compression). Default is `0`. When any compression is enabled, the `mxadmin` user on each target Segment must be able to SSH passwordlessly to each source Segment. It is recommended to test this setup using `dryrun`.
`deduplicate` (Experimental)	If a previous `mxshift` migration was interrupted, enabling this option allows `mxshift` to resume and complete only the remaining tables, skipping those already successfully migrated. This feature is experimental.
`log_level`	Logging level. Default is `info`, which outputs progress, error, and warning messages. Set to `debug` for more detailed logs, useful for development and troubleshooting.
`gphome`	Absolute path to the GPHOME directory on the source database.
`src_host`	Hostname or IP address of the Master node of the source GP cluster.
`src_db`	Source GP database from which data will be migrated.
`src_user`	Username for connecting to the source GP database.
`src_password`	Password for the source GP database user.
`src_port`	Port number of the source GP database.
`target_host`	Hostname or IP address of the Master node of the target MatrixDB cluster.
`target_db`	Target MatrixDB database to which data will be migrated.
`target_user`	Username for connecting to the target MatrixDB database.
`target_password`	Password for the target MatrixDB database user.
`target_port`	Port number of the target MatrixDB database.
`table_list`	List of source GP tables to migrate. For partitioned tables, specify the parent partitioned table. To migrate an entire database, leave this empty and configure `schema_list` instead. `mxshift` will automatically migrate all non-empty tables in descending order of size.
`exclude_table_list`	List of schema and table names to exclude from migration.
`schema_list`	List of schemas in the source GP database to include. Priority order: if `table_list` is specified, only those tables are migrated; otherwise, if `schema_list` is defined, all tables under the listed schemas are migrated; if neither is defined, all tables in the database are considered.
`exclude_schema_list`	List of schemas to exclude from migration. Tables under these schemas will not be migrated. If a schema appears in both `schema_list` and `exclude_schema_list`, it will be excluded.
`no-color`	Disables colored output in logs.

If you want to perform incremental migration using WHERE conditions, use the following additional parameters:

Parameter	Description
`enabled`	Disabled by default. Set to `true` to enable WHERE-based filtering.
`global`	A SQL WHERE expression applied to all tables in the migration. If a table does not contain the columns referenced in the WHERE clause, the filter is skipped for that table, and `mxshift` will mark it as SKIPPED.
`override`	An array of objects, each containing `schema`, `name`, and a custom `where` condition. These override the global WHERE condition for specific tables.

2.2 Command-Line Options

The mxshift command-line options are as follows:

Option	Description
`-c` or `--config_path`	Path to the `mxshift` configuration file (e.g., `config.json`).
`-n` or `--no-color`	Disable colored log output.
`-v` or `--version`	Display the version of the `mxshift` tool.

3 Examples

3.1 Full Migration

Example configuration file config.json for full migration:

{
    "mode": "normal",
    "verify": false,
    "bothway": true,
    "concurrency": 5,
    "compress_method": "",
    "log_level": "info",
    "gphome": "/usr/local/greenplum-db-6.7.1",
    "src_host": "sdw3",
    "src_db": "testdb",
    "src_user": "gpadmin",
    "src_password": "greenplum@2020",
    "src_port": 54322,
    "target_host": "172.16.100.32",
    "target_db": "ms",
    "target_user": "mxadmin",
    "target_password": "xxxx",
    "target_port": 5432,
    "schema_list":["test_schema_1", "test_schema_2"],
    "exclude_schema_list":["test_schema_5", "test_schema_8"],
    "table_list":[
        {
           "schema":"test_schema_1",
           "name":"table_001"
        },
        {
           "schema":"test_schema_2",
           "name":"table_002"
        }
    ],
    "exclude_table_list":[
        {
            "name":"test_schema_3",
            "schema":"table_003"
        }
    ]
}

After preparing the JSON file, run the following command to start the full migration:

mxshift -c config_path.json

3.2 Incremental Migration

Use WHERE conditions to perform incremental migration:

{
    "mode": "normal",
    "verify": false,
    "bothway": true,
    "concurrency": 5,
    "compress_method": "",
    "log_level": "info",
    "gphome": "/usr/local/greenplum-db-6.7.1",
    "src_host": "sdw3",
    "src_db": "testdb",
    "src_user": "gpadmin",
    "src_password": "greenplum@2020",
    "src_port": 54322,
    "target_host": "172.16.100.32",
    "target_db": "ms",
    "target_user": "mxadmin",
    "target_password": "xxxx",
    "target_port": 5432,
    "where": {
        "enabled": true,
        "global": "txdate >= '2022-10-01' AND batchnum >= 100000000",
        "override": [
            {
                "schema":"test_schema_1",
                "name":"table_001",
                "where": "abc > 10"
            }
        ]
    },
    "schema_list":["test_schema_1", "test_schema_2"],
    "exclude_schema_list":["test_schema_5", "test_schema_8"],
    "table_list":[
        ...
    ],
    "exclude_table_list":[
        ...
    ]
}

After preparing the JSON file, run the following command to start the incremental migration:

mxshift -c config_path.json

Note: Ensure "enabled" is set to true to activate the WHERE filter.

3.3 Migrating Replicated Tables

When migrating an entire database or all tables within a schema, no extra configuration is needed. mxshift automatically identifies replicated tables and applies appropriate migration logic to prevent data duplication.
If a specific table in table_list is a replicated table, you must explicitly declare it by adding the policyType field with value "r".

Example configuration snippet:

"table_list":[
    {
        "schema":"public",
        "name":"tbl_test_replica",
        "policyType":"r"
    }
],

After preparing the JSON file, run the following command to start the replicated table migration:

mxshift -c config_path.json

Note!
For a complete overview of data migration strategies and procedures, see Data Migration.

← Previous

mxrestore

gpactivatestandby

English Русский 简体中文