Migrating Data from YMatrix 4/5 to YMatrix 6

This document describes best practices for migrating data from YMatrix 4/5 to YMatrix 6.

1 Environment Preparation

Before initiating a critical operation, thorough preparation is essential—both in terms of technical readiness and mental preparedness (as issues may arise at any time). While personal mental preparation varies, we provide a comprehensive checklist for technical preparation. Some steps are optional:

No. Preparation Step Description Optional
1 Back up source cluster data Data migration performs only read operations on the source cluster; no write operations are involved, so there is no risk of data corruption due to migration Yes
2 Install and deploy target database software No, required
3 Set up monitoring for the target cluster Optional, based on requirements Yes
4 Disable all DDL operations from applications This is a critical step. Failure to comply may introduce risks during migration. Please take it seriously. No, required
5 Terminate all application connections This is a critical step. Failure to comply may introduce risks during migration. Please take it seriously. No, required
6 Collect information about source and target clusters Includes hardware/software configurations, source cluster topology, target cluster topology, etc. No, required
7 Back up source cluster metadata Includes DDLs, schema names, user information, etc. No, required
8 Add whitelist entry on the source cluster's Master No, required
9 Create users on the target cluster No, required
10 Create DDLs on the target cluster In YMatrix, recreating indexes after data migration is more efficient. Therefore, when creating DDLs before migration, it is recommended to exclude index creation statements Yes, mxshift now supports automatic DDL migration. See mxshift
11 Restore table structure Yes, mxshift now supports automatic index migration. See mxshift

Below are detailed examples for each step.

1.1 Back Up Source Cluster Data (Optional)

Data migration involves only read operations on the source cluster, so there is no risk of data corruption caused by the migration process. However, if you want additional safety or have other business needs requiring access to the data, use the mxbackup tool to perform parallel cluster backup.

1.2 Install and Deploy Target Database Software

Note!
We recommend not deploying mirror segments during initial cluster setup. Add mirrors after migration completes to improve migration efficiency.

Note!
Hostnames of the target cluster must not conflict with those of the source cluster.

Refer to standard cluster deployment documentation:

1.3 Set Up Monitoring for Target Cluster (Optional)

Refer to monitoring and alerting documentation:

1.4 Disable All DDL Operations from Applications

Note!
Before stopping all application workloads, no DDL operations should be performed on the source YMatrix 4/5 cluster. This includes creating or modifying objects, adding or dropping columns. Statements such as CREATE, ALTER, TRUNCATE, and DROP are prohibited.

1.5 Terminate All Application Connections

Modify the pg_hba.conf file on the Master node of the source YMatrix 4/5 cluster.

$ vim pg_hba.conf

Add client IP addresses using the following format to disable remote access:

host     all         all         <Client IP Address>/<CIDR Mask>       reject

Reload the configuration to apply changes:

$ mxstop -u

1.6 Collect Source and Target Cluster Information

Gather system and cluster details including number of physical machines, OS version, CPU, memory, disk type, disk usage, NIC information, source and target cluster topologies, database license, resource group settings, etc., adjusting according to your specific scenario to ensure full migration readiness. Useful commands include:

No. Command Purpose
1 free -g View OS memory information
2 lscpu View number of CPUs
3 cat /etc/system-release View OS version
4 uname -a Output kernel information in order: kernel name, hostname, kernel release, kernel version, machine architecture, processor type, platform, OS name (omitted if unknown)
5 tail -11 /proc/cpuinfo View CPU details
6 gpcheckperf Test network performance, bandwidth, and disk I/O performance

1.7 Back Up Source Cluster Metadata

As a superuser, use pg_dump to back up DDLs, schema names, user information, etc., from the source YMatrix 4/5 cluster.

# Back up global user objects
$ pg_dumpall -g -f global_user.sql

# Back up table schemas
$ pg_dump <source_database_name> -s -f orig.sql          

# Make a copy for backup
$ cp orig.sql copy.sql                        

Generate SQL script for index creation:

$ cat get_index.sql 
WITH soi (oid, toid, SIZE, tschema, tname) AS
  ( SELECT soioid,
           soitableoid,
           soisize,
           soitableschemaname,
           soitablename
   FROM gp_toolkit.gp_size_of_index 
  ),
  childrel (oid, coid)AS
  ( SELECT t.parentrelid::oid, 
           t.relid::oid 
    FROM pg_partitioned_table, pg_partition_tree(partrelid) t 
    where t.isleaf
  ),
  irn (oid, toid, SIZE, tschema, tname, rn) AS
  ( SELECT *,
           row_number() OVER (
            ORDER BY dt.ssize DESC) rn
   FROM
     ( SELECT soi.oid,
              soi.toid ,
              sum(coalesce(dt2.SIZE, soi.SIZE)) ssize ,
                                                soi.tschema,
                                                soi.tname
      FROM soi
      LEFT JOIN
        ( SELECT childrel.oid,
                 soi.SIZE
         FROM soi
         INNER JOIN childrel ON soi.toid = childrel.coid ) dt2 ON soi.toid = dt2.oid
      GROUP BY 1,
               2,
               4,
               5 ) dt )
SELECT SQL || ';'
FROM
  ( SELECT pg_get_indexdef(oid) AS SQL ,
           (rn % 12 + (rn / 12)::int) % 12 AS orderkey
   FROM irn
   WHERE toid NOT IN
       (SELECT coid
        FROM childrel) ) dt
WHERE SQL NOT LIKE 'CREATE UNIQUE INDEX%'
ORDER BY dt.orderkey ;

Execute the above SQL via psql:

$ psql -d <source_database_name> -U mxadmin -t -f get_index.sql > index.sql

1.8 Add Whitelist Entry on Source Cluster Master

Note!
Skip this step if the source and target clusters run on the same server.

On the source cluster’s Master, add the target cluster Master’s IP address to the pg_hba.conf file. Example with IP 172.16.100.2:

host    all    all    172.16.100.2    md5

On the target cluster’s Master, add the source cluster Master’s IP and hostname to /etc/hosts. Example with IP 172.16.100.195 and hostname sdw1:

$ cat add_hosts.sh 
#!/bin/bash
for line  in `psql -Atc "select distinct  hostname from gp_segment_configuration order by datadir desc"`
do
    gpssh -h $hostname -v -e "echo 172.16.100.195 sdw1 >> /etc/hosts"
done

Reload configuration to apply changes:

$ mxstop -u

1.9 Create Users on Target Cluster

Run the following command in the YMatrix 6 environment:

$ psql -h <YMatrix Server IP> -p <target_cluster_port> -d <target_database> -U <superuser_name> -f global_user.sql

1.10 Create DDLs on Target Cluster (Optional)

mxshift now supports automatic DDL migration. Refer to section "2 Migration Execution" for details. If manual DDL creation is needed, refer to sections "1.10" and "1.11".

In the YMatrix 6 environment:

$ psql -h <YMatrix Server IP> -p <target_cluster_port> -d <target_database> -U <superuser_name> -f orig.sql 

1.11 Restore Table Structure (Optional)

Use the backed-up orig.sql file to restore table structures on the target YMatrix 6 cluster:

$ time psql -d <target_database_name> -f orig.sql > restoreddl.log 2>&1 &

2 Migration Execution

Note!
For detailed parameter descriptions of mxshift, see mxshift.

First, create a configuration file config_path.toml:

[database]
        [database.source]
        ## Name of database
        db-database= "testdb"
        ## Hostname of database master
        db-host="sdw3"
        ## password of database
        db-password="xxxx"
        ## Port of database master
        db-port=54322
        ## user name of database
        db-user="mxadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output' */
         # db-version="PostgreSQL 12 (MatrixDB 5.2.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun  5 2023 15:45:24"
         ## The installation directory of matrixdb
         install-dir="/usr/local/greenplum-db-6.7.1"
         [[database.source.hostname-to-ip]]
             ## Replace content within <> with actual values and remove <>
             node-hostname="<mdw>" 
             node-ip="<127.0.0.1>"
         [[database.source.hostname-to-ip]]
             node-hostname="<sdw1>"
             node-ip="<127.0.0.2>"
         [[database.source.hostname-to-ip]]
             node-hostname="<sdw2>"
             node-ip="<127.0.0.3>"

        [database.target]
        ## Name of database
        db-database="destdb"
        ## Hostname of database master
        db-host="172.16.100.32"
        ## password of database
        db-password="yyyy"
        ## Port of database master
        db-port=5432
        ## user name of database
        db-user="mxadmin"
        ## Version of database(Please use the result of 'SELECT version();' as value). Required only when
        ##       1. Source database is un-reachable, and 'ddl.only-ddl' is enabled and 'ddl.mode' is 'input'
        ##       2. Target database is un-reachable, and 'ddl.mode' is 'output' */
         # db-version="PostgreSQL 12 (MatrixDB 5.2.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Jun  5 2023 15:45:24"

[scope]
## Compression method for data transfer; allowed values: 0/gzip/lz4/zstd        
compress-method="lz4"
## Mode for transferring data from source to target; allowed values: normal/dryrun/fetch/motion.
## dryrun: execute DDL only, no data transfer
## fetch: fetch data from source and discard
## motion: fetch data, redistribute, then discard
mode="normal"
## SQL to select segment info from source database        
# select-source-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
## SQL to select segment info from target database
# select-target-segment-sql="SELECT dbid, content, port, hostname FROM gp_segment_configuration WHERE status = 'u' AND role = 'p' ORDER BY CONTENT;"
        [[scope.table-list]]
                schema="test_schema_1"
                name="table_001"
        [[scope.table-list]]
                schema="test_schema_2"
                name="table_002"
        [[scope.exclude-table-list]]
                schema="test_schema_3"
                name="table_003"
schema-list=["test_schema_1", "test_schema_2"]
exclude-schema-list=["test_schema_5", "test_schema_8"]
## Whether to disable incremental data migration; default is true.
# disable-data-increment=true

[log]
## Log level: debug/verbose/info
log-level="info"
## Print log without color
# no-color=false

[controller]
## By default, transfer starts from largest table. Use 'bothway' to start from both largest and smallest
both-way=true
## Number of tables to transfer concurrently
concurrency=3

[transfer]
## Verify record count for each table
verify=true
with-index=true

[ddl]
enabled=true
# file-path="/tmp/mxshift.sql"
# mode="output"
only-ddl=false
## During DDL transfer, skip resource queue/group migration; default is true
# skip-resource-queue-and-group=true
## During DDL transfer, skip tablespace migration; default is true
# skip-table-space=true
        [[ddl.replace]]
        ## Only applicable for Greenplum to YMatrix migration
                category="role"
                [[ddl.replace.pairs]]
                        old="mxadmin"
                        new="mxadmin"
## Whether to disable incremental DDL migration; default is true
# disable-ddl-increment=true   

Then execute the migration on the target YMatrix 6 cluster:

$ mxshift -c config_path.toml

3 Post-Migration Tasks

3.1 Create Indexes (Optional)

Create indexes on the target YMatrix 6 cluster:

$ psql -h localhost -p <target_cluster_port> -d <target_database_name> -U <superuser_name> -f index.sql >>idx.out 2>&1 &

3.2 Run analyzedb Command

Update statistics for the entire database on the target YMatrix 6 cluster:

$ export PGPORT=<target_cluster_port>
time analyzedb -d <target_database_name> -p 10 -a 

3.3 Add Mirror Segments

Add mirror segments to the target YMatrix 6 cluster. Example steps:

# First, check current cluster instance information
postgres=# SELECT * from gp_segment_configuration order by 1;
 dbid | content | role | preferred_role | mode | status | port | hostname | address |                   datadir
------+---------+------+----------------+------+--------+------+----------+---------+---------------------------------------------
    1 |      -1 | p    | p              | n    | u      | 5432 | mdw      | mdw     | /home/mxdata_20220925154450/master/mxseg-1
    2 |       0 | p    | p              | n    | u      | 6000 | sdw2     | sdw2    | /home/mxdata_20220925154450/primary/mxseg0
    3 |       1 | p    | p              | n    | u      | 6001 | sdw2     | sdw2    | /home/mxdata_20220925154450/primary/mxseg1
    4 |       2 | p    | p              | n    | u      | 6000 | sdw3     | sdw3    | /home/mxdata_20220925154450/primary/mxseg2
    5 |       3 | p    | p              | n    | u      | 6001 | sdw3     | sdw3    | /home/mxdata_20220925154450/primary/mxseg3
    6 |      -1 | m    | m              | s    | u      | 5432 | sdw1     | sdw1    | /home/mxdata_20220925154450/standby/mxseg-1
(6 rows)

# Create a file containing all hostnames
$ cat /home/mxadmin/seg_hosts 
sdw1
sdw2
sdw3
sdw4

# Use gpssh to batch-create mirror directories
$ gpssh -f /home/mxadmin/seg_hosts -e 'mkdir -p /home/mxdata_20220925154450/mirror'

# Generate mirror template file
$ mxaddmirrors -o ./addmirror

# View mirror template
$ cat addmirror

# Execute mirror addition
$ mxaddmirrors -i addmirror

# Finally, verify updated cluster configuration
postgres=# SELECT * from gp_segment_configuration order by 1;
 dbid | content | role | preferred_role | mode | status | port | hostname | address |                   datadir
------+---------+------+----------------+------+--------+------+----------+---------+---------------------------------------------
    1 |      -1 | p    | p              | n    | u      | 5432 | mdw      | mdw     | /home/mxdata_20220925154450/master/mxseg-1
    2 |       0 | p    | p              | n    | u      | 6000 | sdw2     | sdw2    | /home/mxdata_20220925154450/primary/mxseg0
    3 |       1 | p    | p              | s    | u      | 6001 | sdw2     | sdw2    | /home/mxdata_20220925154450/primary/mxseg1
    4 |       2 | p    | p              | s    | u      | 6000 | sdw3     | sdw3    | /home/mxdata_20220925154450/primary/mxseg2
    5 |       3 | p    | p              | s    | u      | 6001 | sdw3     | sdw3    | /home/mxdata_20220925154450/primary/mxseg3
    6 |      -1 | m    | m              | s    | u      | 5432 | sdw1     | sdw1    | /home/mxdata_20220925154450/standby/mxseg-1
    7 |       0 | m    | m              | n    | d      | 7000 | sdw3     | sdw3    | /home/mxdata_20220925154450/mirror/mxseg0
    8 |       1 | m    | m              | s    | u      | 7001 | sdw3     | sdw3    | /home/mxdata_20220925154450/mirror/mxseg1
    9 |       2 | m    | m              | s    | u      | 7000 | sdw2     | sdw2    | /home/mxdata_20220925154450/mirror/mxseg2
   10 |       3 | m    | m              | s    | u      | 7001 | sdw2     | sdw2    | /home/mxdata_20220925154450/mirror/mxseg3
(10 rows)

After completing the above steps, restore application access and monitor workload behavior. Observe stability over a period appropriate to your operational context. If the system runs stably, congratulations—your data migration is successfully completed!