Quick onboard
Deployment
Data Modeling
Connecting
Migration
Query
Operations and Maintenance
Common Maintenance
Partition
Backup and Restore
Expansion
Mirroring
Resource Management
Security
Monitoring
Performance Tuning
Troubleshooting
Reference Guide
Tool guide
Data type
Storage Engine
Executor
Stream
DR (Disaster Recovery)
Configuration
Index
Extension
SQL Reference
The previous section introduced how MatrixDB models time-series data. As a distributed database, MatrixDB differs from standalone databases in terms of data storage. This section explains the distributed architecture of MatrixDB and how to design table distribution strategies based on this architecture.
For the accompanying video tutorial, refer to MatrixDB Data Modeling and Spatio-Temporal Distribution Model
MatrixDB is a distributed database with a centralized architecture, consisting of two types of nodes: master and segment.

The master is the central control node responsible for receiving user requests and generating query plans. It does not store data. Data is stored on segment nodes.
There is only one master node, while the number of segment nodes is at least one and can scale up to dozens or even hundreds. More segment nodes enable higher storage capacity and stronger computational power.
MatrixDB supports two data distribution methods across segments: sharding and replication. Sharding further divides into hash and random distribution:
In sharded distribution, data is horizontally partitioned, with each row stored on only one segment. In hash sharding, one or more distribution keys are specified. The database computes a hash value over these keys to determine the target segment. Random sharding assigns rows to segments randomly.
In replicated distribution, the entire table is duplicated across all segment nodes. Every segment stores all rows. This method consumes significant storage space and is typically used only for small tables that frequently participate in join operations.
In summary, there are three distribution strategies. You can specify the strategy when creating a table using the DISTRIBUTED BY clause:
DISTRIBUTED BY (column)DISTRIBUTED RANDOMLYDISTRIBUTED REPLICATED
In MatrixDB’s distributed architecture, most tables use sharded storage, yet full relational queries are supported. From the user's perspective, it behaves like a PostgreSQL instance with virtually unlimited capacity.
How are joins across segments handled? MatrixDB uses a Motion operation. When data involved in a join resides on different segments, Motion moves the required data to the same segment for processing.

However, Motion incurs overhead. Therefore, when designing tables, carefully consider both the data distribution pattern and the expected query workload.
Motion can be avoided only when all data involved in a join already resides on the same segment. Hence, for large tables frequently joined, it is best practice to set the join key as the distribution key.
Now that we understand MatrixDB’s distribution methods and their trade-offs, let’s discuss optimal distribution strategies for time-series tables.
Consider the metrics table first. Metric data volumes are typically very large, making replicated distribution impractical. Random distribution lacks predictability and hinders efficient analysis. Therefore, hash distribution is preferred.
Which column should be chosen as the distribution key?
Time-series data has three main dimensions:
Most analytical queries group by device and join with attributes from the device table. To align data placement with query patterns, use Device ID as the distribution key.
The device table has a relatively fixed size and does not grow indefinitely like metric data. Therefore, replicated distribution is recommended. This ensures that joins with the metrics table do not require inter-segment data movement, improving performance.