The MARS3 storage engine builds upon MARS2 with multiple optimizations. It adopts a hybrid row-column storage format, enabling both high-speed data ingestion and aggressive compression.
MARS3 supports data updates and deletes via the UPDATE and DELETE clauses (except in Unique Mode).
MARS3 supports adding and dropping columns, as well as COPY and pg_dump operations.
Each MARS3 table internally uses an LSM Tree structure for storage. LSM Tree (Log-Structured Merge Tree) is a layered, ordered, disk-oriented data structure. Its core idea is to leverage disk performance through batched sequential writes, which significantly outperforms random writes.
The internal architecture of MARS3 is illustrated below:
_1688719823.png)
We will analyze the above diagram layer by layer.
COLLATE C on this column can accelerate sorting.prefer_load_mode parameter. See Configuration Parameters below.rowstore_size parameter) can be tuned to optimize performance for different workloads.CREATE INDEX brin_idx ON t1 USING mars3_brin(time,tag_id);Dead data during merges. You can also periodically run VACUUM to manually clean up Dead data.INSERT, then flushed into a Run in L0.DELETE for deletion. Deletions are recorded in the Delta file of the corresponding Run and physically removed during merging.UPDATE for updates. An update first deletes the old row and then inserts a new one.UPDATE clauses; simply using INSERT automatically performs the operation. To update a row with a specific Unique Key (i.e., the value of the sort key defined at table creation), insert a new row with the same Unique Key. For example: CREATE TABLE mars3_t(c1 int NOT NULL, c2 int) USING MARS3 WITH (uniquemode=true) ORDER BY (c1, c2);, where the Unique Key is (c1, c2).Note!
When Unique Mode is enabled, the first field in theORDER BYclause must have theNOT NULLconstraint.
After creating the matrixts extension, the simplest way to create a MARS3 table is to add the CREATE TABLE clause with USING and ORDER BY. For extended examples, see Table Design Best Practices.
=# CREATE TABLE metrics (
ts timestamp,
dev_id bigint,
power float,
speed float,
message text
) USING MARS3
ORDER BY (dev_id,ts);
Note!
MARS3 tables support BRIN indexes, but they are not required. However, theORDER BYclause must be used when creating a MARS3 table to define the sort key.
Note!
These are table-level parameters. They can only be set at table creation time using theWITH(mars3options='a=1,b=2,...')clause (except forcompress_threshold, which is also supported by MARS2 and can be set directly viaWITH). Once set, they cannot be changed. These parameters apply to a single table. For more information, see Database Table Parameters.
The following parameter controls the size of Runs in L0, indirectly affecting Run sizes in higher levels.
| Parameter | Unit | Default | Range | Description |
|---|---|---|---|---|
| rowstore_size | MB | 64 | 8 ~ 1024 | Controls when an L0 Run switches. A new Run is created when data size exceeds this value. |
The following parameter sets the compression threshold, balancing compression efficiency and read performance. Too low a value reduces compression; too high increases memory usage.
| Parameter | Unit | Default | Range | Description |
|---|---|---|---|---|
| compress_threshold | Tuples | 1200 | 1 ~ 100000 | Compression threshold. Maximum number of tuples per column to compress in one batch. |
The following parameter defines the data loading mode in MARS3.
| Parameter | Unit | Default | Range | Description |
|---|---|---|---|---|
| prefer_load_mode | normal | normal / bulk | Data loading mode. normal means normal mode: new data is first written to rowstore Runs in L0, then moved to columnstore Runs in L1 after reaching rowstore_size. This involves one extra I/O compared to bulk, and column conversion is asynchronous. Suitable for high-frequency, small-batch writes where I/O capacity is sufficient and latency is critical. bluk means bulk load mode: data is written directly to columnstore Runs in L1. This reduces I/O compared to normal, and column conversion is synchronous. Suitable for low-frequency, large-batch writes where I/O capacity is limited and latency is less critical. |
The following parameter sets the level size amplifier.
| Parameter | Unit | Default | Range | Description |
|---|---|---|---|---|
| level_size_amplifier | 8 | 1 ~ 1000 | Level size amplifier. Threshold for triggering level merges: rowstore_size * (level_size_amplifier ^ level). A higher value slows reads but speeds up writes. Choose based on workload (write-heavy vs. read-heavy, compression ratio, etc.). Ensure the number of Runs per level does not grow too large, as this can degrade query performance or block new inserts. |
Configuration example:
=# CREATE TABLE metrics (
ts timestamp,
dev_id bigint,
power float,
speed float
) USING MARS3
WITH (compress_threshold=1200,mars3options='rowstore_size=64',compresstype=zstd, compresslevel=1)
DISTRIBUTED BY (dev_id)
ORDER BY (dev_id,ts)
PARTITION BY RANGE (ts)
( START ('2023-07-01 00:00:00') INCLUSIVE
END ('2023-08-01 00:00:00') EXCLUSIVE
EVERY (INTERVAL '1 day')
,DEFAULT PARTITION OTHERS);
matrixts_internal.mars3_level_stats: View the status of each Level in a MARS3 table. Useful for assessing table health, such as whether Runs are merging as expected and if Run counts are within normal range.matrixts_internal.mars3_files: View file status of a MARS3 table, including auxiliary and delta files (Data, Delta, Index files, etc.), to verify expected file states.matrixts_internal.mars3_info_brin: View the status of a specific BRIN index on a MARS3 table.The MARS2 storage engine is designed for high-speed data loading and querying, using ordered storage to eliminate the need for searching.
Like MARS3, MARS2 uses an LSM Tree structure for storage.
The internal architecture of MARS2 is shown below:
_1688719834.png)
We will analyze this diagram layer by layer.
COLLATE C on this column can speed up sorting.mars2_automerge_threshold), L0 merge is triggered, combining all L0 Runs into one.level0_upgrade_size), it is promoted to L1.level1_upgrade_size), L1 merge is triggered, combining all L1 Runs into one.level1_upgrade_size), it is promoted to L2.First, create the matrixts extension:
=# CREATE EXTENSION matrixts ;
Then explicitly enable MINMAX:
=# CREATE TABLE metrics (
ts timestamp ENCODING(minmax),
dev_id bigint ENCODING(minmax),
power float,
speed float,
message text
) USING MARS2;
Create the mars2_btree index:
=# CREATE INDEX ON metrics
USING mars2_btree (ts, dev_id);
Same as MARS3.
INSERT, then processed into a Run.TRUNCATE on partitioned tables.After creating the matrixts extension, the simplest way to create a MARS2 table is to use the USING clause in CREATE TABLE and create an index. For extended examples, see Table Design Best Practices.
=# CREATE TABLE metrics (
ts timestamp,
dev_id bigint,
power float,
speed float,
message text
) USING MARS2;
=# CREATE INDEX ON metrics
USING mars2_btree (ts, dev_id);
Note!
Table-level parameters can only be set at table creation using theWITHclause, apply to a single table, and cannot be changed afterward. Global parameters can be set at session or system level. System-level changes requiremxstop -uto take effect. For more information, see Database Table Parameters.
The following parameters control merging behavior (see Level section above).
| Merge Control Parameter | Unit | Default | Range | Description |
|---|---|---|---|---|
| mars2_automerge_threshold | run | 32 | 10 - 2048 | Controls all MARS2 tables: number of L0 Runs to trigger merge. Global parameter. Per-table override: level0_merge_threshold |
| level0_merge_threshold | run | 32 | 1 - 2048 | Controls per-table L0 merge threshold. Table-level parameter |
| level0_upgrade_size | MB | 25 | 1 - 10000 | Controls per-table L0 → L1 upgrade size. If merged Run exceeds this, promote to L1. Table-level parameter |
| level1_upgrade_size | MB | 1000 | 1 - 10000 | Controls per-table L1 → L2 upgrade size. If merged Run exceeds this, promote to L2. Table-level parameter |
mars2_automerge_threshold / level0_merge_threshold to reduce merge frequency.level0_upgrade_size / level1_upgrade_size to promote Runs earlier.The following parameter controls compression behavior.
| Compression Control Parameter | Unit | Default | Range | Description |
|---|---|---|---|---|
| compress_threshold | Tuples | 1200 | 1 - 100000 | Compression threshold. Number of tuples per table to compress in one batch. Maximum tuples per compression unit. Table-level parameter |
The following parameters control sort memory. When inserting into multiple partitioned tables, each gets mars2_sort_mem_core; total usage does not exceed mars2_sort_mem.
| Sort Memory Parameter | Unit | Default | Range | Description |
|---|---|---|---|---|
| mars2_sort_mem | KB | 2097152KB (2GB) | 128KB - 2147483647KB (~2048GB) | Sort memory per insert. For partitioned tables, shared among partitions. Global parameter |
| mars2_sort_mem_core | KB | 16384KB (16MB) | 128KB - 2147483647KB (~2048GB) | Minimum sort memory per partitioned table. Global parameter |
Table-level configuration example:
=# CREATE TABLE metrics (
ts timestamp,
dev_id bigint,
power float,
speed float,
message text
)
USING MARS2
WITH (compress_threshold=1200,level0_merge_threshold=32);
=# CREATE INDEX ON metrics
USING mars2_btree (ts, dev_id);
Session-level global parameter example:
=# SET mars2_sort_mem TO 2097152;
System-level global parameter example:
=# gpconfig -c mars2_sort_mem -v 2097152
=# \q
$ mxstop -u
HEAP is the default storage engine in YMatrix, inherited from PostgreSQL. It supports only rowstore format and does not support columnstore or compression. It uses MVCC and is suitable for workloads with frequent updates and deletes.
Under MVCC, HEAP tables do not physically delete data during updates or deletes. Instead, old data is masked using version visibility. As a result, frequent updates and deletes increase physical storage usage. You should periodically run VACUUM to reclaim space.
Use the following SQL to create a HEAP table in YMatrix:
=# CREATE TABLE disk_heap(
time timestamp with time zone,
tag_id int,
read float,
write float
)
DISTRIBUTED BY (tag_id);
Tables using AOCO or AORO storage engines are collectively called AO (Append-Optimized) tables. They support insert, update, and delete operations and support compression.
AORO supports rowstore; AOCO supports columnstore.
AO tables differ significantly from HEAP tables in both logical and physical structure. While HEAP uses MVCC to manage data visibility after updates and deletes, AO tables use an auxiliary bitmap table to track visible rows.
For AO tables with frequent updates and deletes, regular cleanup is required. The cleanup tool vacuum must reset the bitmap and compress physical files, which is typically slower than HEAP vacuuming.
Note!
For detailed information on storage engines, usage, and best practices, see Table Design Best Practices.