YMatrix

Preface

YMatrix is 100% compatible with Greenplum. Within a maintenance window, you can migrate Greenplum to YMatrix—full or incremental—while keeping applications unaware of the switch.

MatrixShift uses a segment-to-segment parallel architecture that saturates cluster bandwidth and CPU. Compared with pg_dump/pg_restore or gpcopy, it is faster, more stable, and easier to control.

01 What is MatrixShift?

MatrixShift is a full-database migration tool for Greenplum / YMatrix → YMatrix. It moves Greenplum databases to YMatrix with minimal change. By transferring data segment-to-segment in parallel, it maximizes resources on both source and target, delivering much higher throughput than traditional tools.

Supported Capabilities

Full migration Migrate schema (DDL) and data in parallel—database by database—from Greenplum/YMatrix to a new YMatrix cluster.
Incremental migration Migrate newly added tables (DDL + data) in parallel—table by table—from Greenplum 5/6/7 or YMatrix to a new YMatrix cluster.
Conditional migration Filter rows with a WHERE clause during migration.
Distribution policies Support hash, random, replicated tables, and master-only.

MatrixShift currently supports mainstream Greenplum and YMatrix releases; please refer to the official tool release notes for the detailed compatibility matrix.

02 Why Choose MatrixShift?

When Greenplum hits performance limits—or when you standardize OLAP on YMatrix—you need to complete full migration during a short maintenance window with no application impact. Traditional pg_dump/pg_restore often suffers from single-node bottlenecks, poor bandwidth utilization, and extra temporary storage, making windows long and unpredictable.

Advantages of MatrixShift

High parallelism Segment-to-segment transfer leverages per-segment CPU and each host’s NIC bandwidth.
High control Split by database/table, support conditional and incremental migration, and rebuild indexes concurrently.
High reliability Dual verification (row count + hash), resumable on failure, end-to-end consistency.

Feature Comparison

gpcopy is the Greenplum ecosystem’s full-database migration tool. MatrixShift offers finer control, richer options, and higher parallel efficiency. (See product docs for a detailed feature matrix.)

Performance Notes

In practice, gpcopy creates indexes before data loads. This slows data ingestion and inflates index size. Testing with a partitioned table (total ~30 GB, 1000 Mbps link, parallelism=8) shows MatrixShift’s approach—data first, then concurrent index build—yields better throughput and healthier index size.

Example table (abridged):

tpch_s50=# \d lineitem
   Append-Only Columnar Table "tpch.lineitem"
   -----------------+------------------+-----------                                           
 l_orderkey      | bigint           |                      
 l_partkey       | integer          |                    
 l_suppkey       | integer          |                    
 l_linenumber    | integer          |                    
 l_quantity      | smallint         |                      
 l_extendedprice | double precision |                      
 l_discount      | double precision |                      
 l_tax           | double precision |                      
 l_returnflag    | "char"           |                      
 l_linestatus    | "char"           |                       
 l_shipdate      | date             |                    
 l_commitdate    | date             |                          
 l_receiptdate   | date             |           
 l_shipinstruct  | text             |                      
 l_shipmode      | text             |                    
 l_comment       | text             |     
Checksum: t                     
Indexes:                           "lineitem_l_orderkey_idx" btree (l_orderkey)                 
"lineitem_l_quantity_idx" btree (l_quantity)                 
"lineitem_l_suppkey_idx" btree (l_suppkey)                   
Numberofchildtables: 8 (Use \d+ tolist them.)               
Distributedby: (l_orderkey)     
Partitionby: (l_shipdate)

Conclusion: MatrixShift outperforms gpcopy and provides a more migration-friendly strategy.

03 How Fast Can MatrixShift Run?

We continually improve MatrixShift through production projects. Its software limit is “unmeasured” because real-world throughput is bounded by hardware—CPU, network I/O, and disk I/O. In projects, MatrixShift typically drives one of these to 100% utilization. If you raise that ceiling, MatrixShift goes faster.

Selected project results: (Actual rates depend on data scale, network links, object complexity, and parallel strategy.)

[Project A] Tens of TB migrated within the window; link saturated.
[Project B] Multi-TB, heavy indexing; data load first + concurrent index build cut window by >50%.
[Project C] Cross-version Greenplum→YMatrix; dual verification completed within SLA.

04 Case Study: Steel Manufacturer Migration

Source: Greenplum 6.16.2; ~2 TB; two segment hosts; each host with two primaries; one data disk/host (max seq R/W ~200 MB/s); effective LAN ~2.5 Gbps. Performance slow; low compression.

Target: YMatrix 6.4.1; two segment hosts; two primaries/host; max seq R/W ~200 MiB/s; LAN ~2.5 Gbps.

Goal: Complete full migration to YMatrix within ≤6 hours, no-touch switchover, and improve query performance and storage efficiency. Disk throughput was expected to be the bottleneck.

Plan

Architecture & tool: MatrixShift segment-to-segment parallel migration to fully utilize per-node bandwidth and CPU.
Migration strategy: Load data + DDL first; rebuild indexes concurrently after data lands to avoid write amplification and index bloat during transfer.
Verification & cutover: Row count + hash checks; switch connections and validate key reports within the window.

Outcome

Duration: ~2 hours for data transfer + verification (~2 TB). Cutover finished cleanly within the window, minimizing downtime.
Performance & cost:

Typical business queries improved 100–500% (varies by query/model).

Compression improved dramatically; storage footprint ~1/10 of original.

Business impact: Switchover was effectively transparent; production reports and interfaces delivered on schedule.

Results reflect a single project. Your mileage depends on data size, network, object complexity, and parallelism.

05 — How to Use

Prepare a mxshift configuration file. Example:

## Database configuration
[database]
  ## Source database
  [database.source]
  db-database = "testdb"
  db-host     = "sdw3"
  db-password = "xxxx"
  db-port     = 54322
  db-user     = "gpadmin"
  install-dir = "/usr/local/greenplum-db-6.7.1"

  ## Target database
  [database.target]
  db-database = "destdb"
  db-host     = "172.16.100.32"
  db-password = "yyyy"
  db-port     = 5432
  db-user     = "mxadmin"
  ## Use the exact output of 'SELECT version();'
  db-version  = "PostgreSQL 12 (MatrixDB 6.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Sep  5 2025 15:45:24"

## Scope configuration
[scope]
disable-connector = false
## Data transfer compression: 0/gzip/lz4/zstd
compress-method   = "zstd"
## Modes: normal/dryrun/fetch/motion
## normal: full migration
## dryrun: DDL only, no data
## fetch: fetch and discard
## motion: redistribute then discard
mode = "normal"

  ## Included tables
  [[scope.table-list]]
  schema = "test_schema_1"
  name   = "table_001"

  [[scope.table-list]]
  schema = "test_schema_2"
  name   = "table_002"

  ## Excluded tables
  [[scope.exclude-table-list]]
  schema = "test_schema_3"
  name   = "table_003"

## Included schemas
schema-list        = ["test_schema_1", "test_schema_2"]
## Excluded schemas
exclude-schema-list = ["test_schema_5", "test_schema_8"]

## Logging
[log]
## debug/verbose/info
log-level = "info"

## Controller
[controller]
both-way   = true   ## start from largest and smallest tables
concurrency = 3     ## number of tables in parallel

## Transfer
[transfer]
verify     = true   ## verify row counts per table
with-index = true   ## transfer indexes

## DDL
[ddl]
enabled  = true     ## enable DDL migration
only-ddl = false    ## DDL-only mode

  ## DDL replacements
  [[ddl.replace]]
  category = "role"
    [[ddl.replace.pairs]]
    old = "gpadmin"
    new = "mxadmin"

## Verification
[verify]
enabled = true
mode    = "simple-count"

What this file controls

Include/exclude schemas and tables.
Whether to migrate indexes with data.
Parallelism (number of tables in flight).
DDL replace (e.g., table owner, target schema).
DDL-only or data-only modes.
Granular verification strategy.

Run migration

mxshift -c config.toml

Monitor progress in another session

mxshift -c config.toml -I

Topology mapping

If source and target topologies are identical, MatrixShift maps source segments 1:1 to target segments. If they differ, MatrixShift computes a proportional mapping and redistributes data accordingly.

06 — FAQ

What objects are supported? Tables, partitioned tables, indexes, views, sequences, functions, privileges, and more. See the tool’s release notes for exact coverage.
Can I change parallelism during migration? Yes. Update concurrency in the config and send a reload signal. MatrixShift will adopt the new setting.
How do I check migration progress? Use -I as shown above. Logs also contain detailed progress information.

Previous：YMatrix 携手 SmartX 完成产品兼容互认证：生态再拓一城，联合方案赋能重点行业

Next：How YMatrix Powers SVOLT’s Smart Factory Transformation

YMatrix for Smart Factories: Two Practical Data Platform Architectures (Time-Series + Analytics)

Dahshenlin: Achieving Real-Time Finance-Operations Integration with a Modernized Data Foundation

YMatrix HTAP Transforms Month-End Closing for a 16,000-Store Pharma Chain

How a Leading ERP Vendor Entered the AI Fast Lane — A YMatrix Field Story

China Telecom Completes SAP HANA Localization Upgrade

English

Русский

简体中文

MatrixShift for YMatrix: A Practical Guide to Migrating from Greenplum

Preface

01 What is MatrixShift?

02 Why Choose MatrixShift?

Advantages of MatrixShift

Feature Comparison

Performance Notes

03 How Fast Can MatrixShift Run?

04 Case Study: Steel Manufacturer Migration

Plan

Outcome

05 — How to Use

What this file controls

Run migration

Monitor progress in another session

Topology mapping

06 — FAQ

Related blogs

English Русский 简体中文

MatrixShift for YMatrix: A Practical Guide to Migrating from Greenplum

Preface

01 What is MatrixShift?

02 Why Choose MatrixShift?

Advantages of MatrixShift

Feature Comparison

Performance Notes

03 How Fast Can MatrixShift Run?

04 Case Study: Steel Manufacturer Migration

Plan

Outcome

05 — How to Use

What this file controls

Run migration

Monitor progress in another session

Topology mapping

06 — FAQ

Related blogs

English

Русский

简体中文