mxshift

1 Description

The mxshift tool currently supports the following features:

  • Full migration: implement parallel migration of data from Greenplum 5 / Greenplum 6 / MatrixDB cluster to a new MatrixDB cluster in database units
  • Reverse migration: Reverse migration from MatrixDB to Greenplum 5 / Greenplum 6
  • Incremental migration: Incremental data migration through WHERE conditions
  • Replicated Table

2 Use

This section mainly introduces how to use config.json, related parameters, and related parameters of the mxshift command line required to start the mxshift tool.

2.1 Writing configuration files

Using the mxshift tool, you need to prepare a JSON file config.json, with the following content:

{
    "mode":            "normal",
    "verify":          true,
    "bothway":         true,
    "concurrency":     5,
    "compress_method": "lz4",
    "log_level":       "info",
    "gphome":          "/usr/local/greenplum-db-6.7.1",
    "src_host":        "sdw3",
    "src_db":          "testdb",
    "src_user":        "gpadmin",
    "src_password":    "xxxx",
    "src_port":        54322,
    "target_host":     "172.16.100.32",
    "target_db":       "destdb",
    "target_user":     "mxadmin",
    "target_password": "yyyy",
    "target_port":     5432,
    "where": {
        "enabled":  false,
        "global":   "txdate >= '2022-10-01' AND batchnum >= 100000000",
        "override": [
            {
                "schema": "test_schema_1",
                "name":   "table_001",
                "where":  "abc > 10"
            },
            {
                "schema": "test_schema_2",
                "name":   "another_table",
                "where":  "tag != 'aabbcc' AND ts > '2022-01-01'"
            }
        ]
    },
    "schema_list": [
        "test_schema_1",
        "test_schema_2"
    ],
    "exclude_schema_list": [
         "test_schema_5",
         "test_schema_8"
    ],
    "table_list": [
        {
            "schema": "test_schema_1",
            "name":   "table_001"
        },
        {
            "schema": "test_schema_2",
            "name":   "table_002"
        }
    ],
    "exclude_table_list":[
        {
            "schema": "test_schema_3",
            "name":   "table_003"
        }
    ]
}

The following is a detailed parameter description in the configuration file:

Parameter name Description
--mode Running mode. The following values ​​are allowed: normal, dryrun, fetch, motion. normal: default value, refers to the normal data migration process; dryrun: Verify the connectivity of both databases, use a simple database definition language (DDL) for verification operations, but do not read and write data; fetch: On the basis of dryrun, the data on the source side is read and sent to the target data node (Segment), but it is directly discarded and not processed. This option is used to verify the source side reading efficiency and the network transmission efficiency of the source side to the target side network; motion: On the basis of fetch, the data is parsed on the target side database and exchanged to the real target Segment, this option is used to further test the data exchange rate of the target side database
--verify The default value is false. Verify whether the number of rows before and after data migration matches
--bothway mxshift By default, the largest table will be migrated first to achieve the shortest time to migrate the complete data. If bothway is set to true, mxshift will consume at the same time from both the largest and smallest tables
--concurrency How many tables can be moved in parallel, the default value is 5, no more than 10
--compress_method Compression method: gzip / zstd / lz4 / 0, the default value is 0, that is, it is not compressed. At present, any compression algorithm is enabled, the mxadmin user of each segment on the target side needs to log in to each segment on the source side without password ssh. It is recommended to use dryrun for testing after configuration
--deduplicate (Experimental Features) If the previous mxshift migration is not completed, it is turned on to enable mxshift to continue to complete the remaining tables when it is executed again, ignoring the tables that have been successfully migrated. This function is currently in the experimental stage
--log_level The default value is info, which will print normal progress information, error and warning information. If changed to debug, more debug information will be output, only for development and fault location
--gphome GPHOME absolute path to source database
--src_host Source GP cluster Master node hostname (Host) or IP address that needs to be migrated out
--src_db Source of data that needs to be migrated out GP database
--src_user Source of data that needs to be migrated GP data username
--src_password Source of data that needs to be migrated out GP data user password
--src_port Port number of the source of data that needs to be migrated out
--target_host Target MatrixDB cluster master node Host or IP address that needs to be migrated into data
--target_db Target that needs to be migrated into the data MatrixDB database
--target_user Target to migrate data MatrixDB database username
--target_password Target to migrate data MatrixDB database user password
--target_port Port number of the target MatrixDB database that needs to be migrated into the data
--table_list A list of source GP data tables that need to be migrated. If it is a partition table, you need to fill in the partition table. If you need to migrate the entire GP database, you only need to configure the src_db parameter. The table_list parameter is configured as empty. mxshift will migrate all tables of size not 0 from large to small
--exclude_table_list Schema and name list of tables to be excluded
--schema_list The schema of the target GP database that needs to be migrated out of the data is the priority level if table_list has a specific table configured, only the tables in table_list are transmitted. If not, check whether schema is configured in schema_list. If schema is configured, all tables under the schema are transmitted. If there is no schema, all tables under src_db are transmitted
--exclude_schema_list Exclude the schema of the target GP database that needs to be migrated out of data. Data migration is not performed on the tables under the schema. If schema with the same name are configured in schema_list and exclude_schema_list, then the schema will be excluded
--no-color Close color tags

If you want to implement incremental data migration through WHERE conditions, you may use the following parameters:

Parameter name Description
--enabled This function is turned off by default, and manual identification is required when enabled. true
--global SQL's WHERE expression will attempt to append this conditional expression to all tables migrated. If the table does not contain the field name in the WHERE clause, it will automatically skip the table if it does not contain the field name in the WHERE clause, resulting in an SQL statement error. To do this, mxshift will prompt for such tables that cannot be migrated SKIPPED
--override is an array that can contain multiple schema names + table names + WHERE expressions, overwriting the specified WHERE expression of global

2.2 Command line parameters

The mxshift command line parameters are as follows:

Parameter name Description
-c or --config_path mxshift tool global configuration file path
-n or --no-color Print non-color logs
-v or --version mxshift tool version

3 Examples

3.1 Full migration

The configuration file config_path.json is as follows:

{
    "mode": "normal",
    "verify": false,
    "bothway": true,
    "concurrency": 5,
    "compress_method": "",
    "log_level": "info",
    "gphome": "/usr/local/greenplum-db-6.7.1",
    "src_host": "sdw3",
    "src_db": "testdb",
    "src_user": "gpadmin",
    "src_password": "greenplum@2020",
    "src_port": 54322,
    "target_host": "172.16.100.32",
    "target_db": "ms",
    "target_user": "mxadmin",
    "target_password": "xxxx",
    "target_port": 5432,
    "schema_list":["test_schema_1", "test_schema_2"],
    "exclude_schema_list":["test_schema_5", "test_schema_8"],
    "table_list":[
        {
           "schema":"test_schema_1",
           "name":"table_001"
        },
        {
           "schema":"test_schema_2",
           "name":"table_002"
        }
    ],
    "exclude_table_list":[
        {
            "name":"test_schema_3",
            "schema":"table_003"
        }
    ]
}

After preparing the JSON file, execute the following command to start mxshift for full migration.

mxshift -c config_path.json

3.2 Incremental Migration

Implement incremental data migration through WHERE conditions:

{
    "mode": "normal",
    "verify": false,
    "bothway": true,
    "concurrency": 5,
    "compress_method": "",
    "log_level": "info",
    "gphome": "/usr/local/greenplum-db-6.7.1",
    "src_host": "sdw3",
    "src_db": "testdb",
    "src_user": "gpadmin",
    "src_password": "greenplum@2020",
    "src_port": 54322,
    "target_host": "172.16.100.32",
    "target_db": "ms",
    "target_user": "mxadmin",
    "target_password": "xxxx",
    "target_port": 5432,
    "where": {
        "enabled": false,
        "global": "txdate >= '2022-10-01' AND batchnum >= 100000000",
        "override": [
            {
                "schema":"test_schema_1",
                "name":"table_001",
                "where": "abc > 10"
            }
        ]
    },
    "schema_list":["test_schema_1", "test_schema_2"],
    "exclude_schema_list":["test_schema_5", "test_schema_8"],
    "table_list":[
        ...
    ],
    "exclude_table_list":[
        ...
    ]
}

After the JSON file is ready, execute the following command to start mxshift for incremental migration.

mxshift -c config_path.json

3.3 Replicated Table

  • If you are migrating a database or all replication tables in a certain mode, you do not need to make additional configurations in config_path.json. mxshfit will automatically determine which tables are replication tables and execute corresponding migration strategies to prevent repeated migration of data.
  • If a table manually configured in table_list is a replicate table, you need to declare a policyType field for the replicate table separately, and the value of "r" means that the table is a replicate table.
    "table_list":[
          {
                  "schema":"public",
                  "name":"tbl_test_replica",
                  "policyType":"r"
          }
      ],

    After the JSON file is ready, execute the following command to start mxshift for copy table migration.

    mxshift -c config_path.json

    Notes!
    For an introduction to the data migration ideas and complete steps, please see [Data Migration] (/doc/4.7/maintain/migrate/overview)