This section of the documentation mainly introduces best practices for migrating data from YMatrix 4 to YMatrix 5.
Once you decide to do an important operation, be fully prepared and start, whether it is actual conditions or mentally prepared (because you may encounter problems at all times). Mental preparation Everyone has their own way. We will list a relatively complete plan for you, including optional steps, as follows:
| Serial number | Preparation steps | Instructions | Optional | --- | --- | | 1 | Backup source cluster data | Data migration only reads the source cluster data, no writes, so it does not involve the risk of data corruption caused by migration | Yes | 2 | Install and deploy the target database software | | No, necessary steps | 3 | Deploy monitoring for target clusters | Depend on demand | Yes | 4 | Prohibit all DDL operations on the business side | This step is an important step, which will bring risks to migration execution. Please be sure to pay attention to | No, necessary steps | 5 | Interrupt all business connections | This step is an important step, which will bring risks to migration execution. Please be sure to pay attention to | No, necessary steps | 6 | Collect source cluster and target cluster information | Software and hardware configuration information, source cluster topology information, target cluster topology information, etc. | No, necessary steps | 7 | Backup source cluster source information | DDL, pattern name, user information, etc. | No, necessary steps | 8 | Add whitelist to all nodes in the source cluster and the target cluster | | No, necessary steps | 9 | Create a user for the target cluster | | No, necessary steps | 10 |Create DDL for the target cluster| In YMatrix, it is more efficient to re-create indexes after performing the migration operation. Therefore, when creating DDL for the target cluster before migration, it is recommended to create an index without index | Yes, currently mxshift supports automatic migration of DDL, see mxshift | 11 |Restore table structure | | Yes, currently mxshift supports automatic migration of indexes. For details, see mxshift
Based on the table above, we give specific examples.
Data migration only has read operations and no write operations for source cluster data, so it does not involve the risk of data corruption caused by migration. But if you are still worried, or have other business needs that require data, you can use the mxbackup tool to implement cluster parallel backup.
Notes!
We recommend that you do not deploy mirror nodes (Mirrors) when deploying the cluster, and add them after the migration is completed to improve migration efficiency.
Notes!
The target cluster host name and the source cluster cannot be repeated!
Please refer to the standard cluster deployment document:
Please refer to the monitoring alarm document:
Notes!
Before officially stopping all services on the business side, the source YMatrix 4 cluster service side does not allow any DDL to be executed, including creating objects, modifying objects, adding fields, deleting fields, and prohibiting the execution of CREATE, ALTER, TRUNCATE, and DROP statements.
Modify the pg_hba.conf file on the source YMatrix 4 cluster master.
$ vim pg_hba.conf
Add the business client address in the following format to disable remote access.
host all all <Client IP Address>/<Subnet Mask Bits> reject
Then reload the configuration to make the modified configuration file take effect
$ mxstop -u
Collect source cluster and target cluster information, including the number of physical machines, operating system, CPU, memory, disk type, disk usage, network card information, source cluster topology, target cluster topology, database license, resource group configuration, etc., and use it according to the specific scenario to make comprehensive migration execution preparations. The following commands may be used:
Serial number | Command | Purpose |
---|---|---|
1 | free -g | View operating system memory information |
2 | lscpu | View CPU quantity |
3 | cat /etc/system-release | View operating system version information |
4 | uname -a | Output all kernel information in the following order (where the detection results of -p and -i are omitted if they are agnostic): kernel name; host name on network node; kernel issue number; kernel version; host hardware architecture name; processor type (not portable); hardware platform or (not portable); operating system name |
5 | tail -11 /proc/cpuinfo | View CPU information |
6 | gpcheckperf | Network performance, bandwidth, disk I/O performance detection |
Under Super User, use the pg_dump tool to back up the source YMatrix 4 cluster DDL, schema name, user information, etc.
# Backup global user objects
$ pg_dumpall -g -f global_user.sql
# Backup table structure
$ pg_dump <Source Database Name> -s -f orig.sql
# Copy a backup
$ cp orig.sql copy.sql
Generates the SQL file that creates the index.
$ cat get_index.sql
WITH soi (oid, toid, SIZE, tschema, tname) AS
( SELECT soioid,
soitableoid,
soisize,
soitableschemaname,
soitablename
FROM gp_toolkit.gp_size_of_index
),
childrel (oid, coid)AS
( SELECT t.parentrelid::oid,
t.relid::oid
FROM pg_partitioned_table, pg_partition_tree(partrelid) t
where t.isleaf
),
irn (oid, toid, SIZE, tschema, tname, rn) AS
( SELECT *,
row_number() OVER (
ORDER BY dt.ssize DESC) rn
FROM
( SELECT soi.oid,
soi.toid ,
sum(coalesce(dt2.SIZE, soi.SIZE)) ssize ,
soi.tschema,
soi.tname
FROM soi
LEFT JOIN
( SELECT childrel.oid,
soi.SIZE
FROM soi
INNER JOIN childrel ON soi.toid = childrel.coid ) dt2 ON soi.toid = dt2.oid
GROUP BY 1,
2,
4,
5 ) dt )
SELECT SQL || ';'
FROM
( SELECT pg_get_indexdef(oid) AS SQL ,
(rn % 12 + (rn / 12)::int) % 12 AS orderkey
FROM irn
WHERE toid NOT IN
(SELECT coid
FROM childrel) ) dt
WHERE SQL NOT LIKE 'CREATE UNIQUE INDEX%'
ORDER BY dt.orderkey ;
Execute the above SQL via psql.
$ psql -d <Source Database Name> -U mxadmin -t -f get_index.sql > index.sql
Notes!
If the source cluster and the target cluster are running on the same server, just skip this step.
In the source cluster and the target cluster master, execute the following command to add the host IP addresses of all nodes of the source cluster and the target cluster to the pg_hba.conf
file. The IP address and subnet mask in the example are 172.16.100.2/32 and 172.16.100.3/32.
Notes!
If there are multiple hosts, you need to write all host IPs to the script.$ cat config_hba.sh
#!/bin/bash
for line in `psql -Atc "select hostname||','|| datadir
from gp_segment_configuration order by datadir desc"`
do
hostname=`echo $line|awk -F "," '{print $1}'`
datadir=`echo $line|awk -F "," '{print $2}'`
gpssh -h $hostname -v -e "echo host all all 172.16.100.2/32 md5>> ${datadir}/pg_hba.conf"
gpssh -h $hostname -v -e "echo host all all 172.16.100.3/32 md5>> ${datadir}/pg_hba.conf"
done
In the source cluster and the target cluster master, execute the following command to add the host IP address and host name of all nodes of the source cluster and the target cluster to the /etc/hosts file. In the example, the host IP address is 172.16.100.195 and the host name is sdw1.
$ cat add_hosts.sh
#!/bin/bash
for line in `psql -Atc "select distinct hostname from gp_segment_configuration order by datadir desc"`
do
gpssh -h $hostname -v -e "echo 172.16.100.195 sdw1 >> /etc/hosts"
done
Then reload the configuration to make the modified configuration file take effect
$ mxstop -u
Execute the following command in the YMatrix 5 cluster environment.
$ psql -h <YMatrix Server IP Address> -p <Target cluster port number> -d <Target database> -U <Target database superuser name> -f global_user.sql
mxshift now supports automatic migration of DDL. For details, please refer to "2 Migration Execution". If you need to create a DDL manually, refer to the "1.10" and "1.11" sections.
Execute the following command in the YMatrix 5 cluster environment.
$ psql -h <YMatrix Server IP Address> -p <Target cluster port number> -d <Target database> -U <Target database superuser name> -f orig.sql
Use the backup orig.sql file to restore the table structure in the target cluster YMatrix 5.
$ time psql -d <Target database name> -f orig.sql > restoreddl.log 2>&1 &
Notes!
For detailed parameters, please refer to mxshift
First write the configuration file config_path.toml.
[database]
[database.source]
## Name of database
db-database= "testdb"
## Hostname of database master
db-host="sdw3"
## password of database
db-password="xxxx"
## Port of database master
db-port=54322
## user name of database
db-user="mxadmin"
/* Version of database(Please use the result of 'SELECT version();' as value). Required only when
1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 5.1.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun 5 2023 15:45:24"
## The installation directory of matrixdb
install-dir="/usr/local/greenplum-db-6.7.1"
[[database.source.hostname-to-ip]]
## The content within <> should be replaced with actual information and <> should be removed
node-hostname="<mdw>"
node-ip="<127.0.0.1>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw1>"
node-ip="<127.0.0.2>"
[[database.source.hostname-to-ip]]
node-hostname="<sdw2>"
node-ip="<127.0.0.3>"
[database.target]
## Name of database
db-database="destdb"
## Hostname of database master
db-host="172.16.100.32"
## password of database
db-password="yyyy"
## Port of database master
db-port=5432
## user name of database
db-user="mxadmin"
/* Version of database(Please use the result of 'SELECT version();' as value). Required only when
1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
2. Target database is un-reachable, and 'ddl.mode' is 'output' */
# db-version="PostgreSQL 12 (MatrixDB 5.1.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun 5 2023 15:45:24"
[scope]
## The compress method for transferring data, methods restricted to 0/gzip/lz4/zstd
compress-method="lz4"
## mode for transferring data from source to target database, and value restricted to normal/dryrun/fetch/motion.
##dryrun for only executing ddl, not transferring data
##fetch for fetching data from source and abandon
##motion for fetching data from source, redistributing and finally abandon
mode="normal"
## Sql for select segment information from source database
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## Sql for select segment information from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
[[scope.table-list]]
schema="test_schema_1"
name="table_001"
[[scope.table-list]]
schema="test_schema_2"
name="table_002"
[[scope.exclude-table-list]]
schema="test_schema_3"
name="table_003"
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]
## Whether to disable data incremental migration, by default, it is true.
# disable-data-increment=true
[log]
## The log level, value restricted to: debug/verbose/info.
log-level="info"
## Print log without color.
# no-color=false
[controller]
## By default, transfer work will start from the largest table. If set 'bothway' it will start from both the largest and the smallest table
both-way=true
## The number of table transferred at the same time
concurrency=3
[transfer]
## Verity the number of record of every table
verify=true
with-index=true
[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
## During the DDL transfer, whether to skip the transfer of resource queue or group, by default, it is true.
# skip-resource-queue-and-group=true
## During the DDL transfer, whether to skip the transfer of tablespace, by default, it is true.
# skip-table-space=true
[[ddl.replace]]
## Only applicable for the case of migration from Greenplum to YMatrix
category="role"
[[ddl.replace.pairs]]
old="mxadmin"
new="mxadmin"
## Whether to disable ddl incremental migration, by default, it is true.
# disable-ddl-increment=true
Then perform data migration on the target YMatrix 5 cluster.
$ mxshift -c config_path.toml
Perform the creation of an index on the target cluster YMatrix 5.
$ psql -h localhost -p <Target cluster port number> -d <Target database name> -U <Target database superuser name> -f index.sql >>idx.out 2>&1 &
Update the library statistics on the target cluster YMatrix 5.
$ export PGPORT=<Target cluster port number>
time analyzedb -d <Target database name> -p 10 -a
Add Mirror on the target cluster YMatrix 5. The steps are as follows:
# First, check the current cluster instance information
postgres=# SELECT * from gp_segment_configuration order by 1;
dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir
-----+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | -1 | p | p | n | u | 5432 | mdw | mdw | /home/mxdata_20220925154450/master/mxseg-1
2 | 0 | p | p | n | u | 6000 | sdw2 | sdw2 | /home/mxdata_20220925154450/primary/mxseg0
3 | 1 | p | p | n | u | 6001 | sdw2 | sdw2 | /home/mxdata_20220925154450/primary/mxseg1
4 | 2 | p | p | n | u | 6000 | sdw3 | sdw3 | /home/mxdata_20220925154450/primary/mxseg2
5 | 3 | p | p | n | u | 6001 | sdw3 | sdw3 | /home/mxdata_20220925154450/primary/mxseg3
6 | -1 | m | m | s | u | 5432 | sdw1 | sdw1 | /home/mxdata_20220925154450/standby/mxseg-1
(6 rows)
# Create a file with all hostnames
$ cat /home/mxadmin/seg_hosts
sdw1
sdw2
sdw3
sdw4
# Batch increase of Mirror directory via gpssh command
$ gpssh -f /home/mxadmin/seg_hosts -e 'mkdir -p /home/mxdata_20220925154450/mirror'
# Generate Mirror template file
$ mxaddmirrors -o ./addmirror
# View Mirror template files
$ cat addmirror
# Perform the Add Mirror Operation
$ mxaddmirrors -i addmirror
# Finally, check the cluster instance again
postgres=# SELECT * from gp_segment_configuration order by 1;
dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir
-----+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | -1 | p | p | n | u | 5432 | mdw | mdw | /home/mxdata_20220925154450/master/mxseg-1
2 | 0 | p | p | n | u | 6000 | sdw2 | sdw2 | /home/mxdata_20220925154450/primary/mxseg0
3 | 1 | p | p | s | u | 6001 | sdw2 | sdw2 | /home/mxdata_20220925154450/primary/mxseg1
4 | 2 | p | p | s | u | 6000 | sdw3 | sdw3 | /home/mxdata_20220925154450/primary/mxseg2
5 | 3 | p | p | s | u | 6001 | sdw3 | sdw3 | /home/mxdata_20220925154450/primary/mxseg3
6 | -1 | m | m | s | u | 5432 | sdw1 | sdw1 | /home/mxdata_20220925154450/standby/mxseg-1
7 | 0 | m | m | n | d | 7000 | sdw3 | sdw3 | /home/mxdata_20220925154450/mirror/mxseg0
8 | 1 | m | m | s | u | 7001 | sdw3 | sdw3 | /home/mxdata_20220925154450/mirror/mxseg1
9 | 2 | m | m | s | u | 7000 | sdw2 | sdw2 | /home/mxdata_20220925154450/mirror/mxseg2
10 | 3 | m | m | s | u | 7001 | sdw2 | sdw2 | /home/mxdata_20220925154450/mirror/mxseg3
(10 rows)
After completing the above steps, restore business access and observe business operation status, and keep track of it for a period of time (the specific time depends on the specific timing scenario). If it runs stably, congratulations to the successful completion of the data migration!