Online Cluster Expansion via Command Line

If you prefer to perform cluster expansion using command-line operations, YMatrix provides corresponding expansion tools and SQL statements to support flexible and seamless scaling. Cluster expansion generally consists of two steps: adding new nodes and redistributing data.

Adding new nodes is a fast operation and can be completed without downtime in YMatrix. Data redistribution, however, is more complex. By default, any newly created table will be distributed across all nodes. Tables or partitions undergoing redistribution are locked and become unreadable and unwritable during the process, potentially causing blocking.

In certain scenarios, users may not need to redistribute all tables—only specific existing or newly created tables require updated distribution strategies. To address this, YMatrix offers a smooth expansion feature. This allows you to define custom node sets by creating a SEGMENT_SET object and specifying that table data should reside only within that defined node set (i.e., data is distributed exclusively among a specific subset of segments).

YMatrix also supports additional capabilities such as parallel execution of table data redistribution and joining tables that have not been redistributed with those that have.

1 System Planning

Before performing expansion, proper system planning is essential. Below is a checklist outlining the complete expansion workflow.

Step Task
Preparation
1 Install the same version of YMatrix software on new hosts
2 Check cluster status
3 Verify software version
4 Add hostnames to system files
Expansion
5 Execute expansion
Post-Expansion Verification
6 View the updated cluster configuration
7 Check data distribution

After expansion and before initiating data redistribution, plan your redistribution strategy carefully.

For example, time-series data partitioned by time often exhibits clear hot/cold characteristics (a relative concept based on query frequency—frequently accessed data is considered "hot"). Over time, hot data becomes cold, and access demand decreases. Allocating extra resources for querying cold data is inefficient, making full redistribution unnecessary. In such cases, it's better to predefine distribution for incoming data at table creation time. YMatrix’s smooth expansion capability enables this approach.

Conversely, if most tables are non-partitioned or partitioned with uniform access patterns and ongoing inserts across partitions, full data redistribution is more appropriate.

To redistribute all data, follow the steps in Table 1.

Table 1

Step Task
Data Redistribution
1 Run full data redistribution command
Post-Redistribution Tasks
2 Check current redistribution status
3 Verify data distribution

To perform smooth expansion, refer to Table 2 below.
Four different methods are provided; choose one based on your needs.

Table 2

Method Smooth Expansion
(1) Create a desired segment set (CREATE SEGMENT_SET), then specify it when creating new tables or partitions
(2) Specify an existing segment set (created prior to expansion) during table creation
(3) Do not specify a segment set for new tables; use the node set defined by the mx_default_segment_set parameter (default: global distribution across all nodes)
(4) Redistribute only selected old data while handling new data separately. Use ALTER TABLE SET SEGMENT_SET for old data redistribution. For new data, apply any one of methods (1), (2), or (3) above

2 Preparation

Below are step-by-step instructions. Assume your current cluster includes three servers: mdw, sdw1, and sdw2. You plan to add two new machines: sdw3 and sdw4.
The test environment contains a sample table test with 100 rows of test data.

=# CREATE EXTENSION matrixts;
=# CREATE TABLE test (a int, b int) USING MARS3 DISTRIBUTED BY (a) ORDER BY (a);
=# INSERT INTO test SELECT i, i+10 FROM generate_series(1, 100) AS i;

2.1 Install Matching YMatrix Software on New Hosts

First, install the same version of YMatrix software on the two new hosts being added to the cluster. Refer to Online Cluster Deployment Parts 1–3 for detailed deployment procedures.

2.2 Check Cluster Status

Next, query the gp_segment_configuration catalog table to verify the cluster state and ensure all segment instances are healthy.

=# SELECT * FROM gp_segment_configuration ORDER BY 1 ASC;

dbid | content | role | preferred_role | mode | status | port | hostname | address |                datadir                
------+---------+------+----------------+------+--------+------+----------+---------+---------------------------------------
    1 |      -1 | p    | p              | n    | u      | 5432 | mdw      | mdw     | /data/mxdata/master/mxseg-1
    2 |       0 | p    | p              | n    | u      | 6000 | sdw1     | sdw1    | /data/mxdata/primary/mxseg0
    3 |       1 | p    | p              | n    | u      | 6001 | sdw1     | sdw1    | /data/mxdata/primary/mxseg1
    4 |       2 | p    | p              | n    | u      | 6000 | sdw2     | sdw2    | /data/mxdata/primary/mxseg2
    5 |       3 | p    | p              | n    | u      | 6001 | sdw2     | sdw2    | /data/mxdata/primary/mxseg3
(5 rows)

2.3 Verify Software Version

Use the version() function to confirm that the YMatrix version matches across all nodes.

=# SELECT version();

2.4 Add Hostnames to System Files

After confirming cluster health and version consistency, configure the /etc/hosts file with root privileges on all nodes, adding the new hostnames. Ensure network connectivity throughout this process.

[<username>@mdw ~]$ sudo vi /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.247.128 mdw
192.168.247.129 sdw1
192.168.247.130 sdw2
192.168.247.131 sdw3
192.168.247.132 sdw4

3 Perform Expansion

Note!
All expansion operations must be performed on the Master node.

3.1 Gather Current Cluster Information

$ sudo /opt/ymatrix/matrixdb6/bin/mxctl expand init > /tmp/init_output

This outputs structured cluster information to the /tmp/init_output directory.

3.2 Add New Hosts

$ cat /tmp/init_output | sudo /opt/ymatrix/matrixdb6/bin/mxctl expand add --host newhost1 > /tmp/add.1
$ cat /tmp/add.1 | sudo /opt/ymatrix/matrixdb6/bin/mxctl expand add --host newhost2 > /tmp/add.2
...

Repeat this command once per new host, using the output of each run as input for the next.

3.3 Test Network Connectivity

$ cat /tmp/add.3 | sudo /opt/ymatrix/matrixdb6/bin/mxctl expand netcheck > /tmp/ncheck

If the exit code (echo $?) is greater than 0, the connectivity test has failed.

3.4 Generate Expansion Plan

$ cat /tmp/ncheck | sudo /opt/ymatrix/matrixdb6/bin/mxbox deployer expand > /tmp/exandplan

The generated /tmp/exandplan file contains all configuration details including directory paths, number of Segments, and Mirror placement policies. Advanced users may edit this file directly.

3.5 Execute Expansion Plan

$ cat /tmp/exandplan | sudo /opt/ymatrix/matrixdb6/bin/mxbox deployer exec

Congratulations! The expansion has been successfully executed.

4 Post-Expansion Verification

4.1 View Updated Cluster Configuration

=# SELECT * FROM gp_segment_configuration ORDER BY 1;
 dbid | content | role | preferred_role | mode | status | port | hostname | address |                datadir                
------+---------+------+----------------+------+--------+------+----------+---------+---------------------------------------
    1 |      -1 | p    | p              | n    | u      | 5432 | mdw      | mdw     | /mxdata_20220331173619/master/mxseg-1
    2 |       0 | p    | p              | n    | u      | 6000 | sdw1     | sdw1    | /mxdata_20220331173619/primary/mxseg0
    3 |       1 | p    | p              | n    | u      | 6001 | sdw1     | sdw1    | /mxdata_20220331173619/primary/mxseg1
    4 |       2 | p    | p              | n    | u      | 6000 | sdw2     | sdw2    | /mxdata_20220331173619/primary/mxseg2
    5 |       3 | p    | p              | n    | u      | 6001 | sdw2     | sdw2    | /mxdata_20220331173619/primary/mxseg3
    6 |       4 | p    | p              | n    | u      | 6000 | sdw3     | sdw3    | /mxdata_20220331173619/primary/mxseg4
    7 |       5 | p    | p              | n    | u      | 6001 | sdw3     | sdw3    | /mxdata_20220331173619/primary/mxseg5
    8 |       6 | p    | p              | n    | u      | 6000 | sdw4     | sdw4    | /mxdata_20220331173619/primary/mxseg6
    9 |       7 | p    | p              | n    | u      | 6001 | sdw4     | sdw4    | /mxdata_20220331173619/primary/mxseg7
(9 rows)

4.2 Check Data Distribution

Use the following SQL statement to view how existing table data is distributed across the original segments. Now is the time to plan your new distribution strategy.

=# SELECT gp_segment_id, count(*) FROM public.test GROUP BY gp_segment_id;

5 Plan Data Distribution

Before redistributing data, understand that redistribution is a critical and time-consuming task after expansion.

Prior to expansion, business data resides only on old segments. After adding new nodes, redistributing existing data evenly across all current nodes is known as "data redistribution." When aligned with workload patterns, redistribution significantly improves query performance. However, inappropriate redistribution wastes time and resources. Therefore, careful planning is essential.

Two approaches are available. Based on your time-series use case, select one.

5.1 Redistribute All Data

Newly added segments contain no data initially. Use the method below to redistribute all existing data across the expanded cluster. You can schedule the operation or execute it immediately.

Note!
It is recommended to perform redistribution during off-peak hours.

Immediate execution:

$ sudo /opt/ymatrix/matrixdb6/bin/mxctl expand redistribute

Scheduled execution:

$ sudo /opt/ymatrix/matrixdb6/bin/mxctl expand redistribute --schedule "2022-07-20 22:00:00"

You can reschedule the operation before the scheduled time arrives. Remove the --schedule parameter to trigger redistribution immediately.

During and after redistribution, monitor progress using system tables in the Expansion System Catalogs. Recheck data distribution with:

=# SELECT gp_segment_id, count(*) FROM public.test GROUP BY gp_segment_id;

5.2 Redistribute Partial Data (Smooth Expansion)

This approach is referred to in YMatrix as smooth expansion. As planned earlier, choose one of the four available methods based on your actual requirements.

Method Smooth Expansion
(1) Create a required segment set (CREATE SEGMENT_SET), then assign it when creating new tables or partitions
(2) Directly specify an existing segment set (from pre-expansion) at table creation
(3) Do not specify a segment set; rely on the node set defined by mx_default_segment_set (default: all nodes)
(4) Some old data also requires redistribution. Handle new and old data separately: use ALTER TABLE SET SEGMENT_SET for old data; apply any of methods (1), (2), or (3) for new data

5.2.1 Create a New Segment Set

Use the SQL command CREATE SEGMENT_SET to create a new segment set in YMatrix.

=# CREATE SEGMENT_SET name SEGMENTS (content_id, ...);

5.2.2 Specify an Existing Segment Set

Use SQL to assign an already-created segment set during table definition. Examples:

=# CREATE TABLE t1(a int, b int) USING MARS3 DISTRIBUTED BY(a) SEGMENT_SET ss1 ORDER BY (a);
=# CREATE TABLE t1(a int, b int) USING MARS3 DISTRIBUTED REPLICATED SEGMENT_SET ss1 ORDER BY (a);
=# CREATE TABLE t1(a int, b int) USING MARS3 DISTRIBUTED RANDOMLY SEGMENT_SET ss1 ORDER BY (a);

For more information, see CREATE SEGMENT_SET.

5.2.3 Use Default Segment Set

When creating a new table without specifying a segment set, data is distributed according to the node set defined by the mx_default_segment_set parameter. By default, this is all nodes in the cluster. Change the default using the SET command:

=# SET mx_default_segment_set TO 'ss1';

5.2.4 Redistribute Selected Old Data

When some existing data also needs redistribution, handle old and new data separately. Use the ALTER TABLE SET SEGMENT_SET command to redistribute old data. For new data, apply any of the methods described in sections 5.2.1, 5.2.2, or 5.2.3.

Example: Move existing table t to segment set ss1.

=# ALTER TABLE t SET SEGMENT_SET ss1;

Alternatively, redistribute table t across all nodes post-expansion:

=# ALTER TABLE t SET SEGMENT_SET all_segments;

#or

=# ALTER TABLE t EXPAND TABLE;

Your command-line-driven cluster expansion is now complete!

Note!
After command-line expansion, restart mxui to reload the cluster topology and redeploy all mxui_collector instances.
Use the following command: /opt/ymatrix/matrixdb6/bin/supervisorctl restart mxui