YMatrix is 100% compatible with Greenplum. Within a maintenance window, you can migrate Greenplum to YMatrix—full or incremental—while keeping applications unaware of the switch.
MatrixShift uses a segment-to-segment parallel architecture that saturates cluster bandwidth and CPU. Compared with pg_dump/pg_restore or gpcopy, it is faster, more stable, and easier to control.
MatrixShift is a full-database migration tool for Greenplum / YMatrix → YMatrix. It moves Greenplum databases to YMatrix with minimal change. By transferring data segment-to-segment in parallel, it maximizes resources on both source and target, delivering much higher throughput than traditional tools.
Supported Capabilities
Full migration Migrate schema (DDL) and data in parallel—database by database—from Greenplum/YMatrix to a new YMatrix cluster.
Incremental migration Migrate newly added tables (DDL + data) in parallel—table by table—from Greenplum 5/6/7 or YMatrix to a new YMatrix cluster.
Conditional migration
Filter rows with a WHERE clause during migration.
Distribution policies Support hash, random, replicated tables, and master-only.
MatrixShift currently supports mainstream Greenplum and YMatrix releases; please refer to the official tool release notes for the detailed compatibility matrix.
When Greenplum hits performance limits—or when you standardize OLAP on YMatrix—you need to complete full migration during a short maintenance window with no application impact. Traditional pg_dump/pg_restore often suffers from single-node bottlenecks, poor bandwidth utilization, and extra temporary storage, making windows long and unpredictable.
High parallelism Segment-to-segment transfer leverages per-segment CPU and each host’s NIC bandwidth.
High control Split by database/table, support conditional and incremental migration, and rebuild indexes concurrently.
High reliability Dual verification (row count + hash), resumable on failure, end-to-end consistency.
gpcopy is the Greenplum ecosystem’s full-database migration tool. MatrixShift offers finer control, richer options, and higher parallel efficiency.
(See product docs for a detailed feature matrix.)
In practice, gpcopy creates indexes before data loads. This slows data ingestion and inflates index size. Testing with a partitioned table (total ~30 GB, 1000 Mbps link, parallelism=8) shows MatrixShift’s approach—data first, then concurrent index build—yields better throughput and healthier index size.
Example table (abridged):
tpch_s50=# \d lineitem
Append-Only Columnar Table "tpch.lineitem"
-----------------+------------------+-----------
l_orderkey | bigint |
l_partkey | integer |
l_suppkey | integer |
l_linenumber | integer |
l_quantity | smallint |
l_extendedprice | double precision |
l_discount | double precision |
l_tax | double precision |
l_returnflag | "char" |
l_linestatus | "char" |
l_shipdate | date |
l_commitdate | date |
l_receiptdate | date |
l_shipinstruct | text |
l_shipmode | text |
l_comment | text |
Checksum: t
Indexes: "lineitem_l_orderkey_idx" btree (l_orderkey)
"lineitem_l_quantity_idx" btree (l_quantity)
"lineitem_l_suppkey_idx" btree (l_suppkey)
Numberofchildtables: 8 (Use \d+ tolist them.)
Distributedby: (l_orderkey)
Partitionby: (l_shipdate)
Conclusion: MatrixShift outperforms gpcopy and provides a more migration-friendly strategy.
We continually improve MatrixShift through production projects. Its software limit is “unmeasured” because real-world throughput is bounded by hardware—CPU, network I/O, and disk I/O. In projects, MatrixShift typically drives one of these to 100% utilization. If you raise that ceiling, MatrixShift goes faster.
Selected project results: (Actual rates depend on data scale, network links, object complexity, and parallel strategy.)
[Project A] Tens of TB migrated within the window; link saturated.
[Project B] Multi-TB, heavy indexing; data load first + concurrent index build cut window by >50%.
[Project C] Cross-version Greenplum→YMatrix; dual verification completed within SLA.
Source: Greenplum 6.16.2; ~2 TB; two segment hosts; each host with two primaries; one data disk/host (max seq R/W ~200 MB/s); effective LAN ~2.5 Gbps. Performance slow; low compression.
Target: YMatrix 6.4.1; two segment hosts; two primaries/host; max seq R/W ~200 MiB/s; LAN ~2.5 Gbps.
Goal: Complete full migration to YMatrix within ≤6 hours, no-touch switchover, and improve query performance and storage efficiency. Disk throughput was expected to be the bottleneck.
Architecture & tool: MatrixShift segment-to-segment parallel migration to fully utilize per-node bandwidth and CPU.
Migration strategy: Load data + DDL first; rebuild indexes concurrently after data lands to avoid write amplification and index bloat during transfer.
Verification & cutover: Row count + hash checks; switch connections and validate key reports within the window.
Duration: ~2 hours for data transfer + verification (~2 TB). Cutover finished cleanly within the window, minimizing downtime.
Performance & cost:
Typical business queries improved 100–500% (varies by query/model).
Compression improved dramatically; storage footprint ~1/10 of original.
Results reflect a single project. Your mileage depends on data size, network, object complexity, and parallelism.
Prepare a mxshift configuration file. Example:
## Database configuration
[database]
## Source database
[database.source]
db-database = "testdb"
db-host = "sdw3"
db-password = "xxxx"
db-port = 54322
db-user = "gpadmin"
install-dir = "/usr/local/greenplum-db-6.7.1"
## Target database
[database.target]
db-database = "destdb"
db-host = "172.16.100.32"
db-password = "yyyy"
db-port = 5432
db-user = "mxadmin"
## Use the exact output of 'SELECT version();'
db-version = "PostgreSQL 12 (MatrixDB 6.5.0-enterprise) (Greenplum Database 7.0.0+dev.17410.gedbdb5ef84 build dev) on arm-apple-darwin21.5.0, compiled by Apple clang version 13.0.0 (clang-1300.0.27.3), 64-bit compiled on Sep 5 2025 15:45:24"
## Scope configuration
[scope]
disable-connector = false
## Data transfer compression: 0/gzip/lz4/zstd
compress-method = "zstd"
## Modes: normal/dryrun/fetch/motion
## normal: full migration
## dryrun: DDL only, no data
## fetch: fetch and discard
## motion: redistribute then discard
mode = "normal"
## Included tables
[[scope.table-list]]
schema = "test_schema_1"
name = "table_001"
[[scope.table-list]]
schema = "test_schema_2"
name = "table_002"
## Excluded tables
[[scope.exclude-table-list]]
schema = "test_schema_3"
name = "table_003"
## Included schemas
schema-list = ["test_schema_1", "test_schema_2"]
## Excluded schemas
exclude-schema-list = ["test_schema_5", "test_schema_8"]
## Logging
[log]
## debug/verbose/info
log-level = "info"
## Controller
[controller]
both-way = true ## start from largest and smallest tables
concurrency = 3 ## number of tables in parallel
## Transfer
[transfer]
verify = true ## verify row counts per table
with-index = true ## transfer indexes
## DDL
[ddl]
enabled = true ## enable DDL migration
only-ddl = false ## DDL-only mode
## DDL replacements
[[ddl.replace]]
category = "role"
[[ddl.replace.pairs]]
old = "gpadmin"
new = "mxadmin"
## Verification
[verify]
enabled = true
mode = "simple-count"
Include/exclude schemas and tables.
Whether to migrate indexes with data.
Parallelism (number of tables in flight).
DDL replace (e.g., table owner, target schema).
DDL-only or data-only modes.
Granular verification strategy.
mxshift -c config.toml
mxshift -c config.toml -I
If source and target topologies are identical, MatrixShift maps source segments 1:1 to target segments. If they differ, MatrixShift computes a proportional mapping and redistributes data accordingly.
What objects are supported? Tables, partitioned tables, indexes, views, sequences, functions, privileges, and more. See the tool’s release notes for exact coverage.
Can I change parallelism during migration? Yes. Update concurrency in the config and send a reload signal. MatrixShift will adopt the new setting.
How do I check migration progress? Use -I as shown above. Logs also contain detailed progress information.
How a Leading ERP Vendor Entered the AI Fast Lane — A YMatrix Field Story
Smart Manufacturing at Scale with YMatrix HTAP: Real-Time Ingestion & Unified Analytics
China Telecom Completes SAP HANA Localization Upgrade
From Greenplum to YMatrix: Migrating Core Business Data for a Leading Power-Battery Manufacturer