mxshift

1 Description

The mxshift tool currently supports the following features:

Full migration: implement parallel migration of data from Greenplum 5 / Greenplum 6 / MatrixDB cluster to a new MatrixDB cluster in database units
Reverse migration: Reverse migration from MatrixDB to Greenplum 5 / Greenplum 6
Incremental migration: Incremental data migration through WHERE conditions
Replicated Table

2 Use

This section mainly introduces how to use config.json, related parameters, and related parameters of the mxshift command line required to start the mxshift tool.

2.1 Writing configuration files

Using the mxshift tool, you need to prepare a JSON file config.json, with the following content:

{
    "mode":            "normal",
    "verify":          true,
    "bothway":         true,
    "concurrency":     5,
    "compress_method": "lz4",
    "log_level":       "info",
    "gphome":          "/usr/local/greenplum-db-6.7.1",
    "src_host":        "sdw3",
    "src_db":          "testdb",
    "src_user":        "gpadmin",
    "src_password":    "xxxx",
    "src_port":        54322,
    "target_host":     "172.16.100.32",
    "target_db":       "destdb",
    "target_user":     "mxadmin",
    "target_password": "yyyy",
    "target_port":     5432,
    "where": {
        "enabled":  false,
        "global":   "txdate >= '2022-10-01' AND batchnum >= 100000000",
        "override": [
            {
                "schema": "test_schema_1",
                "name":   "table_001",
                "where":  "abc > 10"
            },
            {
                "schema": "test_schema_2",
                "name":   "another_table",
                "where":  "tag != 'aabbcc' AND ts > '2022-01-01'"
            }
        ]
    },
    "schema_list": [
        "test_schema_1",
        "test_schema_2"
    ],
    "exclude_schema_list": [
         "test_schema_5",
         "test_schema_8"
    ],
    "table_list": [
        {
            "schema": "test_schema_1",
            "name":   "table_001"
        },
        {
            "schema": "test_schema_2",
            "name":   "table_002"
        }
    ],
    "exclude_table_list":[
        {
            "schema": "test_schema_3",
            "name":   "table_003"
        }
    ]
}

The following is a detailed parameter description in the configuration file:

Parameter name	Description
--mode	Running mode. The following values are allowed: normal, dryrun, fetch, motion. normal: default value, refers to the normal data migration process; dryrun: Verify the connectivity of both databases, use a simple database definition language (DDL) for verification operations, but do not read and write data; fetch: On the basis of dryrun, the data on the source side is read and sent to the target data node (Segment), but it is directly discarded and not processed. This option is used to verify the source side reading efficiency and the network transmission efficiency of the source side to the target side network; motion: On the basis of fetch, the data is parsed on the target side database and exchanged to the real target Segment, this option is used to further test the data exchange rate of the target side database
--verify	The default value is false. Verify whether the number of rows before and after data migration matches
--bothway	mxshift By default, the largest table will be migrated first to achieve the shortest time to migrate the complete data. If bothway is set to true, mxshift will consume at the same time from both the largest and smallest tables
--concurrency	How many tables can be moved in parallel, the default value is 5, no more than 10
--compress_method	Compression method: gzip / zstd / lz4 / 0, the default value is 0, that is, it is not compressed. At present, any compression algorithm is enabled, the mxadmin user of each segment on the target side needs to log in to each segment on the source side without password ssh. It is recommended to use dryrun for testing after configuration
--deduplicate (Experimental Features)	If the previous mxshift migration is not completed, it is turned on to enable mxshift to continue to complete the remaining tables when it is executed again, ignoring the tables that have been successfully migrated. This function is currently in the experimental stage
--log_level	The default value is info, which will print normal progress information, error and warning information. If changed to debug, more debug information will be output, only for development and fault location
--gphome	GPHOME absolute path to source database
--src_host	Source GP cluster Master node hostname (Host) or IP address that needs to be migrated out
--src_db	Source of data that needs to be migrated out GP database
--src_user	Source of data that needs to be migrated GP data username
--src_password	Source of data that needs to be migrated out GP data user password
--src_port	Port number of the source of data that needs to be migrated out
--target_host	Target MatrixDB cluster master node Host or IP address that needs to be migrated into data
--target_db	Target that needs to be migrated into the data MatrixDB database
--target_user	Target to migrate data MatrixDB database username
--target_password	Target to migrate data MatrixDB database user password
--target_port	Port number of the target MatrixDB database that needs to be migrated into the data
--table_list	A list of source GP data tables that need to be migrated. If it is a partition table, you need to fill in the partition table. If you need to migrate the entire GP database, you only need to configure the `src_db` parameter. The `table_list` parameter is configured as empty. mxshift will migrate all tables of size not 0 from large to small
--exclude_table_list	Schema and name list of tables to be excluded
--schema_list	The schema of the target GP database that needs to be migrated out of the data is the priority level if `table_list` has a specific table configured, only the tables in `table_list` are transmitted. If not, check whether schema is configured in `schema_list`. If schema is configured, all tables under the schema are transmitted. If there is no schema, all tables under `src_db` are transmitted
--exclude_schema_list	Exclude the schema of the target GP database that needs to be migrated out of data. Data migration is not performed on the tables under the schema. If schema with the same name are configured in `schema_list` and `exclude_schema_list`, then the schema will be excluded
--no-color	Close color tags

If you want to implement incremental data migration through WHERE conditions, you may use the following parameters:

Parameter name	Description
--enabled	This function is turned off by default, and manual identification is required when enabled. true
--global	SQL's WHERE expression will attempt to append this conditional expression to all tables migrated. If the table does not contain the field name in the WHERE clause, it will automatically skip the table if it does not contain the field name in the WHERE clause, resulting in an SQL statement error. To do this, mxshift will prompt for such tables that cannot be migrated SKIPPED
--override	is an array that can contain multiple schema names + table names + WHERE expressions, overwriting the specified WHERE expression of global

2.2 Command line parameters

The mxshift command line parameters are as follows:

Parameter name	Description
-c or --config_path	mxshift tool global configuration file path
-n or --no-color	Print non-color logs
-v or --version	mxshift tool version

3 Examples

3.1 Full migration

The configuration file config_path.json is as follows:

{
    "mode": "normal",
    "verify": false,
    "bothway": true,
    "concurrency": 5,
    "compress_method": "",
    "log_level": "info",
    "gphome": "/usr/local/greenplum-db-6.7.1",
    "src_host": "sdw3",
    "src_db": "testdb",
    "src_user": "gpadmin",
    "src_password": "greenplum@2020",
    "src_port": 54322,
    "target_host": "172.16.100.32",
    "target_db": "ms",
    "target_user": "mxadmin",
    "target_password": "xxxx",
    "target_port": 5432,
    "schema_list":["test_schema_1", "test_schema_2"],
    "exclude_schema_list":["test_schema_5", "test_schema_8"],
    "table_list":[
        {
           "schema":"test_schema_1",
           "name":"table_001"
        },
        {
           "schema":"test_schema_2",
           "name":"table_002"
        }
    ],
    "exclude_table_list":[
        {
            "name":"test_schema_3",
            "schema":"table_003"
        }
    ]
}

After preparing the JSON file, execute the following command to start mxshift for full migration.

mxshift -c config_path.json

3.2 Incremental Migration

Implement incremental data migration through WHERE conditions:

{
    "mode": "normal",
    "verify": false,
    "bothway": true,
    "concurrency": 5,
    "compress_method": "",
    "log_level": "info",
    "gphome": "/usr/local/greenplum-db-6.7.1",
    "src_host": "sdw3",
    "src_db": "testdb",
    "src_user": "gpadmin",
    "src_password": "greenplum@2020",
    "src_port": 54322,
    "target_host": "172.16.100.32",
    "target_db": "ms",
    "target_user": "mxadmin",
    "target_password": "xxxx",
    "target_port": 5432,
    "where": {
        "enabled": false,
        "global": "txdate >= '2022-10-01' AND batchnum >= 100000000",
        "override": [
            {
                "schema":"test_schema_1",
                "name":"table_001",
                "where": "abc > 10"
            }
        ]
    },
    "schema_list":["test_schema_1", "test_schema_2"],
    "exclude_schema_list":["test_schema_5", "test_schema_8"],
    "table_list":[
        ...
    ],
    "exclude_table_list":[
        ...
    ]
}

After the JSON file is ready, execute the following command to start mxshift for incremental migration.

mxshift -c config_path.json

3.3 Replicated Table

If you are migrating a database or all replication tables in a certain mode, you do not need to make additional configurations in config_path.json. mxshfit will automatically determine which tables are replication tables and execute corresponding migration strategies to prevent repeated migration of data.
If a table manually configured in table_list is a replicate table, you need to declare a policyType field for the replicate table separately, and the value of "r" means that the table is a replicate table.
```
"table_list":[
      {
              "schema":"public",
              "name":"tbl_test_replica",
              "policyType":"r"
      }
  ],
```
After the JSON file is ready, execute the following command to start mxshift for copy table migration.
```
mxshift -c config_path.json
```
Notes!
For an introduction to the data migration ideas and complete steps, please see Data Migration

简体中文