This document describes the data migration tool mxshift.
The mxshift tool currently supports the following features:
WHERE clause.REPLICATED), and master-only tables.This section details the parameters of the mxshift tool.
Using mxshift requires preparing a TOML configuration file. Below is a detailed description of the parameters:
To perform conditional migration using a WHERE clause, use the following parameters:
| Category | Sub-category | Sub-parameter | Sub-sub-category | Sub-sub-parameter | Description | Required |
| [transfer] category | [transfer.table-data-where-sql] category | --enabled | If set to `true`, filters source data using a WHERE condition, migrating only matching rows. This feature is off by default and must be manually enabled with `--enabled=true`. | No | ||
| --global | Global WHERE expression applied to all tables. If a table does not contain the referenced column, `mxshift` skips the table with a warning. | No | ||||
| [[transfer.table-data-where-sql.override]] category | --where | Table-specific WHERE clause, takes precedence over global. Applies filtering at the table level. | No | |||
| [transfer.table-data-where-sql.override.table] category | --schema | Schema name of the table to migrate | No | |||
| --name | Table name to migrate | No |
See configuration templates below for usage examples.
Command-line options for mxshift:
| Parameter | Description |
|---|---|
| --config | Prints default configuration file content |
| -a or --all | Proceeds with migration without confirmation. Skips already completed and empty tables by default |
| -c or --config-path | Path to the global configuration file for mxshift |
| -v or --version | Displays mxshift version |
| -l or --show-progress | Shows progress of the most recent migration. If no migration is ongoing, shows last result. If active, shows current progress. STATUS meanings: Not start — not in scope; Skip — skipped (already migrated or empty); Processing/Interrupted — in progress or interrupted; Complete — successfully migrated; Failed — migration failed |
| -R or --redo | Re-migrates tables even if previously migrated (clears existing data in target table before re-migration) |
A complete template file is named config_template.toml. You can generate this template by running mxshift --config.
Below are example templates for different use cases. Use them as references and customize using a text editor.
[database]
[database.source]
## Name of database
db-database= "testdb"
## Hostname of database master
db-host="sdw3"
## password of database
db-password="xxxx"
## Port of database master
db-port=54322
## user name of database
db-user="gpadmin"
## Version of database(Please use the result of 'SELECT version();' as value). Required only when
## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 5.1.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun 5 2023 15:45:24"
## The installation directory of database
install-dir="/usr/local/greenplum-db-6.7.1"
[[database.source.hostname-to-ip]]
## The content within <> should be replaced with actual information and <> should be removed
node-hostname="<mdw>"
node-ip="<127.0.0.1>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw1>"
node-ip="<127.0.0.2>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw2>"
node-ip="<127.0.0.3>"
[database.target]
## Name of database
db-database="destdb"
## Hostname of database master
db-host="172.16.100.32"
## password of database
db-password="yyyy"
## Port of database master
db-port=5432
## user name of database
db-user="mxadmin"
## Version of database(Please use the result of 'SELECT version();' as value). Required only when
## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 5.1.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun 5 2023 15:45:24"
[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd
compress-method="lz4"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
[[scope.table-list]]
schema="test_schema_1"
name="table_001"
[[scope.table-list]]
schema="test_schema_2"
name="table_002"
[[scope.exclude-table-list]]
schema="test_schema_3"
name="table_003"
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]
## Whether to disable data incremental migration, by default, it is true.
# disable-data-increment=true
[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false
[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=3
[transfer]
## Verity the number of record of every table
verify=true
with-index=true
[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
# skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
# skip-table-space=true
[[ddl.replace]]
## Only applicable for the case of migration from Greenplum to YMatrix
category="role"
[[ddl.replace.pairs]]
old="gpadmin"
new="mxadmin"
## Whether to disable ddl incremental migration, by default, it is true.
# disable-ddl-increment=true
After preparing the TOML file, run the following command to start full migration:
$ mxshift -c config_path_full.toml
[database]
[database.source]
## Name of database
db-database= "testdb"
## Hostname of database master
db-host="sdw3"
## password of database
db-password="xxxx"
## Port of database master
db-port=54322
## user name of database
db-user="gpadmin"
## Version of database(Please use the result of 'SELECT version();' as value). Required only when
## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 5.1.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun 5 2023 15:45:24"
## The installation directory of database
install-dir="/usr/local/greenplum-db-6.7.1"
[[database.source.hostname-to-ip]]
## The content within <> should be replaced with actual information and <> should be removed
node-hostname="<mdw>"
node-ip="<127.0.0.1>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw1>"
node-ip="<127.0.0.2>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw2>"
node-ip="<127.0.0.3>"
[database.target]
## Name of database
db-database="destdb"
## Hostname of database master
db-host="172.16.100.32"
## password of database
db-password="yyyy"
## Port of database master
db-port=5432
## user name of database
db-user="mxadmin"
## Version of database(Please use the result of 'SELECT version();' as value). Required only when
## 1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
## 2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 5.1.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun 5 2023 15:45:24"
[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd
compress-method="lz4"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
[[scope.table-list]]
schema="test_schema_1"
name="table_001"
[[scope.table-list]]
schema="test_schema_2"
name="table_002"
[[scope.exclude-table-list]]
schema="test_schema_3"
name="table_003"
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]
##
--- SPLIT ---
Whether to disable data incremental migration. By default, it is set to `true`.
disable-data-increment=false
[log]
## The log level. Valid values are: debug, verbose, info.
log-level="info"
## Print log without color.
# no-color=false
[controller]
## By default, transfer work starts from the largest table. If set to 'bothway', it starts from both the largest and smallest tables.
both-way=true
## The number of tables to transfer concurrently.
concurrency=3
[transfer]
## Verify the number of records in each table.
verify=true
with-index=true
[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
## During DDL transfer, whether to skip resource queue or role group migration. Default is true.
# skip-resource-queue-and-group=true
## During DDL transfer, whether to skip tablespace migration. Default is true.
# skip-table-space=true
[[ddl.replace]]
## Applicable only when migrating from Greenplum to YMatrix
category="role"
[[ddl.replace.pairs]]
old="gpadmin"
new="mxadmin"
## Whether to disable DDL incremental migration. By default, it is true.
disable-ddl-increment=false
After preparing the TOML file, run the following command to start mxshift for incremental migration.
$ mxshift -c config_path_incremental.toml
Use WHERE to perform data migration based on specific conditions, such as excluding expired data:
[database]
[database.source]
## Name of the source database
db-database= "testdb"
## Hostname of the source database master
db-host="sdw3"
## Password for the source database
db-password="xxxx"
## Port of the source database master
db-port=54322
## Username for the source database
db-user="gpadmin"
## Installation directory of the source database
install-dir="/usr/local/greenplum-db-6.7.1"
[database.target]
## Name of the target database
db-database="destdb"
## Hostname of the target database master
db-host="172.16.100.32"
## Password for the target database
db-password="yyyy"
## Port of the target database master
db-port=5432
## Username for the target database
db-user="mxadmin"
[scope]
## Compression method used during data transfer. Valid methods: 0 (no compression), gzip, lz4, zstd
compress-method="lz4"
## Data transfer mode. Valid values: normal, dryrun, fetch, motion
## dryrun: Execute DDL only; do not transfer data
## fetch: Extract data from source and discard
## motion: Extract data from source, redistribute across segments, then discard
mode="normal"
## SQL to query segment information from the source database
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## SQL to query segment information from the target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
[[scope.table-list]]
schema=......
name=......
[[scope.table-list]]
......
[[scope.exclude-table-list]]
......
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]
[log]
## The log level. Valid values are: debug, verbose, info.
log-level="info"
## Print log without color.
# no-color=false
[controller]
## By default, transfer starts from the largest table. Set to 'bothway' to start from both largest and smallest.
both-way=true
## Number of tables to transfer in parallel.
concurrency=3
[transfer]
## Verify row count for every table after transfer.
verify=true
[transfer.table-data-where-sql]
enabled=true
global="txdate >= '2022-10-01' AND batchnum >= 100000000"
[[transfer.table-data-where-sql.override]]
where="abc > 10"
[transfer.table-data-where-sql.override.table]
schema="test_schema_1"
name="table_001"
[[transfer.table-data-where-sql.override]]
where="tag != 'aabbcc' AND ts > '2022-01-01'"
[transfer.table-data-where-sql.override.table]
schema="test_schema_2"
name="another_table"
After preparing the TOML file, run the following command to start mxshift for conditional migration.
$ mxshift -c config_path_conditional.toml
When migrating an entire database or all replicated tables under a specific schema, no additional configuration is required in config_path.toml. mxshift automatically identifies replicated tables and applies the appropriate migration strategy to prevent data duplication. You only need to configure scope.table-list, specifically defining schema and name.
[[scope.table-list]]
schema="public1"
name="table_replicated1"
[[scope.table-list]]
schema="public2"
name="table_replicated2"
After preparing the TOML file, run the following command to start mxshift for replicated table migration.
$ mxshift -c config_path_replicated.toml
Note!
For a complete overview of data migration strategies and procedures, refer to Data Migration