This document describes the data migration tool mxshift.
The mxshift tool currently supports the following features:
WHERE conditions.This section details the parameters of the mxshift tool.
Using mxshift requires preparing a TOML configuration file. Below is a detailed description of the parameters:
| Parameter | Description | Required |
|---|---|---|
| [database.source] category | ||
| --db-host | Hostname or IP address of the source cluster's Master node. The source cluster refers to the database cluster from which data is migrated out. | Yes |
| --db-port | Port number of the source database Master. | Yes |
| --db-database | Name of the source database. | Yes |
| --db-user | Username for the source database. | Yes |
| --db-password | Password for the source database user. | Yes |
| --install-dir | Database installation directory. Supported starting from v4.8.3. For v4.8.2 and earlier, use the gphome parameter instead. |
Yes |
| --hostname-to-ip | Maps hostnames to IP addresses. Use this when source and target clusters share hostnames to ensure accurate routing. Supported starting from v4.8.3. | No |
| [database.target] category | ||
| --db-host | Hostname or IP address of the target cluster's Master node. The target cluster receives the migrated data. | Yes |
| --db-port | Port number of the target database Master. | Yes |
| --db-database | Name of the target database. | Yes |
| --db-user | Username for the target database. | Yes |
| --db-password | Password for the target database user. | Yes |
| [scope] category | ||
| --mode | Operation mode. Allowed values: normal, dryrun, fetch, motion.normal: Default value; performs normal data migration.dryrun: Tests connectivity between databases using simple DDL operations without reading or writing data.fetch: Based on dryrun, reads data from the source and sends it to target Segments but discards it. Used to test source read performance and network transfer speed.motion: Extends fetch by parsing data on the target and redistributing it to correct Segments. Tests target-side data redistribution rate. |
No |
| --compress-method | Compression method: gzip / zstd / lz4 / 0. Default is 0 (no compression).When any compression is enabled, the mxadmin user on each target Segment must be able to SSH passwordlessly to each source Segment. Test with dryrun after setup. |
No |
| --gphome | Absolute path to GPHOME on the source database. This option was removed starting from v4.8.3. | Yes |
| --table-list | List of source table schemas and names. For partitioned tables, specify the parent table. Leave empty to migrate all non-empty tables, sorted from largest to smallest. | No |
| --exclude-table-list | List of schema and table names to exclude from migration. | No |
| --schema-list | Source database schemas. Priority order:If table-list specifies tables, only those are migrated.Otherwise, if table-list lists schemas, all tables under them are migrated.If neither is set, all tables in schema-list` `[database.source] are migrated. |
No |
| --exclude-schema-list | Schemas to exclude from migration. Tables under these schemas will not be migrated. If a schema appears in both db-database and schema-list, it is excluded. |
No |
| [log] category | ||
| --log-level | Default is exclude-schema-list, printing progress, errors, and warnings. Set to info for additional debug output (use only for development and troubleshooting). |
No |
| --no-color | Default is debug. Set to false to disable colored output. |
No |
| [controller] category | ||
| --both-way | Default is true. Starts migration from the largest table. If set to false, starts from both the largest and smallest tables simultaneously. |
No |
| --concurrency | Number of tables migrated in parallel. Default is 5, maximum 10. | No |
| [transfer] category | ||
| --verify | Verifies row count before and after migration. Default is both-way. |
No |
| --with-index | Whether to migrate indexes. Default is true (disabled). When disabled, SQL statements for index creation are saved in a file under the current execution path for manual execution later.Note: For tables created with the MARS2 storage engine, indexes must be created to take effect. Therefore, DDL migration for MARS2 tables automatically includes index migration, regardless of this setting. |
No |
| [ddl] category (supported since v4.8.2) | ||
| --enabled | Whether to migrate table DDL. Default is false. When enabled, mxshift migrates DDL at the database level before migrating data.Supported DDL migration versions:Greenplum6 → YMatrix4 / YMatrix5;YMatrix4 → YMatrix4 / YMatrix5;YMatrix5 → YMatrix5 |
No |
| --file-path | File path for importing/exporting DDL SQL. The file is created if it does not exist. Existing content is cleared before writing. You may view or edit the exported file, but do not modify or delete any comment lines starting with "– DO NOT EDIT.", as this may cause import failures. | No |
| --mode | DDL processing mode. Options: false, false, exec.output: Default. Executes DDL directly on the target database.input: Exports DDL to a file, then exits without migrating data.exec: Reads DDL from a file. Must point output to an mxshift-generated file; otherwise, parsing fails.SPEC: In input mode, if the target database is unreachable, add --file-path under [database.target] (see template below). In output mode, if the source is unreachable, set --db-version to input under [ddl] and add --only-ddl under [database.source] (see template). |
No |
| --only-ddl | Whether to migrate only DDL. Default is true.If --db-version, proceeds to migrate table data after DDL migration.If false, exits after DDL migration without migrating data. |
No |
| --skip-resource-queue-and-group | Whether to skip resource queues (groups) during DDL migration. Default is false. |
No |
| --skip-table-space | Whether to skip tablespaces during DDL migration. Default is true. |
No |
To perform incremental data migration using WHERE conditions, use the following parameters:
| Category | Sub-category | Sub-parameter | Sub-sub-category | Sub-sub-parameter | Description | Required |
| [transfer] | [transfer.table-data-where-sql] | --enabled | If set to ``skip-resource-queue-and-group=true``, filters source data using a WHERE condition to migrate only matching rows. This feature is disabled by default. Enable manually via ``skip-table-space=true``. | No | ||
| --global | SQL ``true`` expression applied globally to all tables. If a table does not contain the ``true`` keyword, ``WHERE`` skips the table automatically. | No | ||||
| [[transfer.table-data-where-sql.override]] | --where | Table-specific WHERE clause. Takes precedence over global WHERE. Applies filtering at the table level. | No | |||
| [transfer.table-data-where-sql.override.table] | --schema | Schema name of the table to migrate. | No | |||
| --name | Name of the table to migrate. | No |
See example templates below for usage.
Command-line options for mxshift:
| Parameter | Description |
|---|---|
| --config | Prints default configuration file content |
| -a or --all | Proceeds with migration without confirmation. Skips already-migrated and empty tables by default |
| -c or --config-path | Path to the mxshift configuration file |
| -v or --version | Displays mxshift version |
| -l or --show-progress | Shows progress of the most recent migration. If no migration is ongoing, shows last result. Status values:Not start: Not in migration scope;Skip: Skipped (already migrated or empty);Processing/Interrupted: In progress or interrupted;Complete: Migration completed;Failed: Migration failed |
| -R or --redo | Re-migrates tables even if previously migrated. Clears existing data in the target table before re-migration |
A complete template file is named WHERE. You can generate this template by running SKIPPED. Run config.toml to begin editing, then copy the entire content into your file and adjust configurations accordingly. Alternatively, use one of the scenario-specific templates provided below.
mxshift --config
[database]
[database.source]
## Name of database
db-database= "testdb"
## Hostname of database master
db-host="sdw3"
## password of database
db-password="xxxx"
## Port of database master
db-port=54322
## user name of database
db-user="mxadmin"
## Version of database(Please use the result of 'SELECT version();' as value). Required only when
## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul 5 2022 15:45:24"
## The installation directory of matrixdb(this option is supported starting from version 4.8.3,4.8.2 and earlier versions need to be specified by gphome)
install-dir="/usr/local/greenplum-db-6.7.1"
[[database.source.hostname-to-ip]]
## The content within <> should be replaced with actual information and <> should be removed(this option is supported starting from version 4.8.3)
node-hostname="<mdw>"
node-ip="<127.0.0.1>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw1>"
node-ip="<127.0.0.2>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw2>"
node-ip="<127.0.0.3>"
[database.target]
## Name of database
db-database="destdb"
## Hostname of database master
db-host="172.16.100.32"
## password of database
db-password="yyyy"
## Port of database master
db-port=5432
## user name of database
db-user="mxadmin"
## Version of database(Please use the result of 'SELECT version();' as value). Required only when
## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul 5 2022 15:45:24"
[scope]
compress_method="lz4"
## (this option is abandoned from version 4.8.3)
# gphome="/usr/local/greenplum-db-6.7.1"
mode="normal"
[[scope.table-list]]
schema="test_schema_1"
name="table_001"
[[scope.table-list]]
schema="test_schema_2"
name="table_002"
[[scope.exclude-table-list]]
schema="test_schema_3"
name="table_003"
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]
[log]
log-level="info"
## Print log without color.
# no-color=false
[controller]
both-way=true
concurrency=5
[transfer]
verify=true
# with-index=true
[transfer.table-data-where-sql]
enabled=false
global="txdate >= '2022-10-01' AND batchnum >= 100000000"
[[transfer.table-data-where-sql.override]]
where="abc > 10"
[transfer.table-data-where-sql.override.table]
schema="test_schema_1"
name="table_001"
[[transfer.table-data-where-sql.override]]
where="tag != 'aabbcc' AND ts > '2022-01-01'"
[transfer.table-data-where-sql.override.table]
schema="test_schema_2"
name="another_table"
[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
skip-table-space=true
[[ddl.replace]]
## Only applicable for the case of migration from Greenplum to YMatrix
category="role"
[[ddl.replace.pairs]]
old="gpadmin"
new="mxadmin"
vim config.toml
After preparing the TOML file, run the following command to start the migration:
```bash
$ mxshift -c config_path.toml
```
```toml
[database]
[database.source]
## Name of database
db-database= "testdb"
## Hostname of database master
db-host="sdw3"
## password of database
db-password="xxxx"
## Port of database master
db-port=54322
## user name of database
db-user="gpadmin"
## Version of database(Please use the result of 'SELECT version();' as value). Required only when
## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul 5 2022 15:45:24"
## The installation directory of matrixdb(this option is supported starting from version 4.8.3,4.8.2 and earlier versions need to be specified by gphome)
install-dir="/usr/local/greenplum-db-6.7.1"
[[database.source.hostname-to-ip]]
## The content within <> should be replaced with actual information and <> should be removed(this option is supported starting from version 4.8.3)
node-hostname="<mdw>"
node-ip="<127.0.0.1>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw1>"
node-ip="<127.0.0.2>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw2>"
node-ip="<127.0.0.3>"
[database.target]
## Name of database
db-database="destdb"
## Hostname of database master
db-host="172.16.100.32"
## password of database
db-password="yyyy"
## Port of database master
db-port=5432
## user name of database
db-user="mxadmin"
## Version of database(Please use the result of 'SELECT version();' as value). Required only when
## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 4.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jul 5 2022 15:45:24"
[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd
compress-method="lz4"
## (this option is abandoned from version 4.8.3)
# gphome="/usr/local/greenplum-db-6.7.1"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
[[scope.table-list]]
schema="test_schema_1"
name="table_001"
[[scope.table-list]]
schema="test_schema_2"
name="table_002"
[[scope.exclude-table-list]]
schema="test_schema_3"
name="table_003"
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]
[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false
[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=5
[transfer]
## Verity the number of record of every table
verify=true
with-index=true
[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
```
--- SPLIT ---
During DDL transfer, whether to skip transferring resource queues or groups. The default is `true`.
# skip-resource-queue-and-group=true
## During DDL transfer, whether to skip transferring tablespaces. The default is `true`.
# skip-table-space=true
[[ddl.replace]]
## Only applicable when migrating from Greenplum to YMatrix
category="role"
[[ddl.replace.pairs]]
old="gpadmin"
new="mxadmin"
After preparing the TOML file, run the following command to start mxshift for full migration.
$ mxshift -c config_path_full.toml
Use WHERE conditions to perform incremental data migration:
[database]
[database.source]
## Name of the database
db-database= "testdb"
## Hostname of the database master
db-host="sdw3"
## Password of the database
db-password="xxxx"
## Port of the database master
db-port=54322
## Username of the database
db-user="gpadmin"
[database.target]
## Name of the database
db-database="destdb"
## Hostname of the database master
db-host="172.16.100.32"
## Password of the database
db-password="yyyy"
## Port of the database master
db-port=5432
## Username of the database
db-user="mxadmin"
[scope]
## Compression method used during data transfer. Valid values: 0/gzip/lz4/zstd
compress-method="lz4"
## Installation directory of MatrixDB
gphome="/usr/local/greenplum-db-6.7.1"
## Data transfer mode between source and target databases. Valid values: normal/dryrun/fetch/motion.
## dryrun: Execute DDL only; no data transfer
## fetch: Fetch data from source and discard
## motion: Fetch data from source, redistribute, then discard
mode="normal"
## SQL to query segment information from the source database
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## SQL to query segment information from the target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
[[scope.table-list]]
schema=......
name=......
[[scope.table-list]]
......
[[scope.exclude-table-list]]
......
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]
[log]
## Log level. Valid values: debug/verbose/info
log-level="info"
## Disable colored output in logs
# no-color=false
[controller]
## By default, transfer starts with the largest table. Set to 'bothway' to start from both the largest and smallest tables
both-way=true
## Number of tables to transfer concurrently
concurrency=5
[transfer]
## Verify row count for each table after transfer
verify=true
[transfer.table-data-where-sql]
enabled=true
global="txdate >= '2022-10-01' AND batchnum >= 100000000"
[[transfer.table-data-where-sql.override]]
where="abc > 10"
[transfer.table-data-where-sql.override.table]
schema="test_schema_1"
name="table_001"
[[transfer.table-data-where-sql.override]]
where="tag != 'aabbcc' AND ts > '2022-01-01'"
[transfer.table-data-where-sql.override.table]
schema="test_schema_2"
name="another_table"
After preparing the TOML file, run the following command to start mxshift for incremental migration.
$ mxshift -c config_path_incremental.toml
When migrating all replicated tables within a database or a specific schema, no additional configuration is required in config_path.toml. mxshift automatically identifies replicated tables and applies the appropriate migration strategy to prevent data duplication. You only need to define the desired tables in scope.table-list using schema and name.
[[scope.table-list]]
schema="public1"
name="table_replicated1"
[[scope.table-list]]
schema="public2"
name="table_replicated2"
After preparing the TOML file, run the following command to start mxshift for replicated table migration.
$ mxshift -c config_path_replicated.toml
Note!
For an overview of data migration strategies and complete procedures, see Data Migration