The mxshift tool currently supports the following features:
This section mainly introduces how to use config.json
, related parameters, and related parameters of the mxshift command line required to start the mxshift tool.
Using the mxshift tool, you need to prepare a JSON file config.json
, with the following content:
{
"mode": "normal",
"verify": true,
"bothway": true,
"concurrency": 5,
"compress_method": "lz4",
"log_level": "info",
"gphome": "/usr/local/greenplum-db-6.7.1",
"src_host": "sdw3",
"src_db": "testdb",
"src_user": "gpadmin",
"src_password": "xxxx",
"src_port": 54322,
"target_host": "172.16.100.32",
"target_db": "destdb",
"target_user": "mxadmin",
"target_password": "yyyy",
"target_port": 5432,
"where": {
"enabled": false,
"global": "txdate >= '2022-10-01' AND batchnum >= 100000000",
"override": [
{
"schema": "test_schema_1",
"name": "table_001",
"where": "abc > 10"
},
{
"schema": "test_schema_2",
"name": "another_table",
"where": "tag != 'aabbcc' AND ts > '2022-01-01'"
}
]
},
"schema_list": [
"test_schema_1",
"test_schema_2"
],
"exclude_schema_list": [
"test_schema_5",
"test_schema_8"
],
"table_list": [
{
"schema": "test_schema_1",
"name": "table_001"
},
{
"schema": "test_schema_2",
"name": "table_002"
}
],
"exclude_table_list":[
{
"schema": "test_schema_3",
"name": "table_003"
}
]
}
The following is a detailed parameter description in the configuration file:
Parameter name | Description |
---|---|
--mode | Running mode. The following values are allowed: normal, dryrun, fetch, motion. normal: default value, refers to the normal data migration process; dryrun: Verify the connectivity of both databases, use a simple database definition language (DDL) for verification operations, but do not read and write data; fetch: On the basis of dryrun, the data on the source side is read and sent to the target data node (Segment), but it is directly discarded and not processed. This option is used to verify the source side reading efficiency and the network transmission efficiency of the source side to the target side network; motion: On the basis of fetch, the data is parsed on the target side database and exchanged to the real target Segment, this option is used to further test the data exchange rate of the target side database |
--verify | The default value is false. Verify whether the number of rows before and after data migration matches |
--bothway | mxshift By default, the largest table will be migrated first to achieve the shortest time to migrate the complete data. If bothway is set to true, mxshift will consume at the same time from both the largest and smallest tables |
--concurrency | How many tables can be moved in parallel, the default value is 5, no more than 10 |
--compress_method | Compression method: gzip / zstd / lz4 / 0, the default value is 0, that is, it is not compressed. At present, any compression algorithm is enabled, the mxadmin user of each segment on the target side needs to log in to each segment on the source side without password ssh. It is recommended to use dryrun for testing after configuration |
--deduplicate (Experimental Features) | If the previous mxshift migration is not completed, it is turned on to enable mxshift to continue to complete the remaining tables when it is executed again, ignoring the tables that have been successfully migrated. This function is currently in the experimental stage |
--log_level | The default value is info, which will print normal progress information, error and warning information. If changed to debug, more debug information will be output, only for development and fault location |
--gphome | GPHOME absolute path to source database |
--src_host | Source GP cluster Master node hostname (Host) or IP address that needs to be migrated out |
--src_db | Source of data that needs to be migrated out GP database |
--src_user | Source of data that needs to be migrated GP data username |
--src_password | Source of data that needs to be migrated out GP data user password |
--src_port | Port number of the source of data that needs to be migrated out |
--target_host | Target MatrixDB cluster master node Host or IP address that needs to be migrated into data |
--target_db | Target that needs to be migrated into the data MatrixDB database |
--target_user | Target to migrate data MatrixDB database username |
--target_password | Target to migrate data MatrixDB database user password |
--target_port | Port number of the target MatrixDB database that needs to be migrated into the data |
--table_list | A list of source GP data tables that need to be migrated. If it is a partition table, you need to fill in the partition table. If you need to migrate the entire GP database, you only need to configure the src_db parameter. The table_list parameter is configured as empty. mxshift will migrate all tables of size not 0 from large to small |
--exclude_table_list | Schema and name list of tables to be excluded |
--schema_list | The schema of the target GP database that needs to be migrated out of the data is the priority level if table_list has a specific table configured, only the tables in table_list are transmitted. If not, check whether schema is configured in schema_list . If schema is configured, all tables under the schema are transmitted. If there is no schema, all tables under src_db are transmitted |
--exclude_schema_list | Exclude the schema of the target GP database that needs to be migrated out of data. Data migration is not performed on the tables under the schema. If schema with the same name are configured in schema_list and exclude_schema_list , then the schema will be excluded |
--no-color | Close color tags |
If you want to implement incremental data migration through WHERE conditions, you may use the following parameters:
Parameter name | Description |
---|---|
--enabled | This function is turned off by default, and manual identification is required when enabled. true |
--global | SQL's WHERE expression will attempt to append this conditional expression to all tables migrated. If the table does not contain the field name in the WHERE clause, it will automatically skip the table if it does not contain the field name in the WHERE clause, resulting in an SQL statement error. To do this, mxshift will prompt for such tables that cannot be migrated SKIPPED |
--override | is an array that can contain multiple schema names + table names + WHERE expressions, overwriting the specified WHERE expression of global |
The mxshift command line parameters are as follows:
Parameter name | Description |
---|---|
-c or --config_path | mxshift tool global configuration file path |
-n or --no-color | Print non-color logs |
-v or --version | mxshift tool version |
The configuration file config_path.json
is as follows:
{
"mode": "normal",
"verify": false,
"bothway": true,
"concurrency": 5,
"compress_method": "",
"log_level": "info",
"gphome": "/usr/local/greenplum-db-6.7.1",
"src_host": "sdw3",
"src_db": "testdb",
"src_user": "gpadmin",
"src_password": "greenplum@2020",
"src_port": 54322,
"target_host": "172.16.100.32",
"target_db": "ms",
"target_user": "mxadmin",
"target_password": "xxxx",
"target_port": 5432,
"schema_list":["test_schema_1", "test_schema_2"],
"exclude_schema_list":["test_schema_5", "test_schema_8"],
"table_list":[
{
"schema":"test_schema_1",
"name":"table_001"
},
{
"schema":"test_schema_2",
"name":"table_002"
}
],
"exclude_table_list":[
{
"name":"test_schema_3",
"schema":"table_003"
}
]
}
After preparing the JSON file, execute the following command to start mxshift for full migration.
mxshift -c config_path.json
Implement incremental data migration through WHERE conditions:
{
"mode": "normal",
"verify": false,
"bothway": true,
"concurrency": 5,
"compress_method": "",
"log_level": "info",
"gphome": "/usr/local/greenplum-db-6.7.1",
"src_host": "sdw3",
"src_db": "testdb",
"src_user": "gpadmin",
"src_password": "greenplum@2020",
"src_port": 54322,
"target_host": "172.16.100.32",
"target_db": "ms",
"target_user": "mxadmin",
"target_password": "xxxx",
"target_port": 5432,
"where": {
"enabled": false,
"global": "txdate >= '2022-10-01' AND batchnum >= 100000000",
"override": [
{
"schema":"test_schema_1",
"name":"table_001",
"where": "abc > 10"
}
]
},
"schema_list":["test_schema_1", "test_schema_2"],
"exclude_schema_list":["test_schema_5", "test_schema_8"],
"table_list":[
...
],
"exclude_table_list":[
...
]
}
After the JSON file is ready, execute the following command to start mxshift for incremental migration.
mxshift -c config_path.json
config_path.json
. mxshfit will automatically determine which tables are replication tables and execute corresponding migration strategies to prevent repeated migration of data.table_list
is a replicate table, you need to declare a policyType
field for the replicate table separately, and the value of "r" means that the table is a replicate table."table_list":[
{
"schema":"public",
"name":"tbl_test_replica",
"policyType":"r"
}
],
After the JSON file is ready, execute the following command to start mxshift for copy table migration.
mxshift -c config_path.json
Notes!
For an introduction to the data migration ideas and complete steps, please see [Data Migration] (/doc/4.7/maintain/migrate/overview)