Based on the characteristics of different storage engines, you can flexibly create tables suited to various scenarios. Below are recommended practices.
MARS3 tables depend on the matrixts extension. Before creating a table, you must first install this extension in the database where the MARS3 storage engine will be used.
Note!
Thematrixtsextension is database-level. Create it once per database; do not recreate it repeatedly.
=# CREATE EXTENSION matrixts;
Use USING MARS3 to specify the storage engine when creating a table, and use ORDER BY to define the sort key. A basic example is shown below:
=# CREATE TABLE mars3(
time timestamp with time zone,
tag_id int,
i4 int4,
i8 int8
)
USING MARS3 ORDER BY (tag_id, time);
We extend this to the following vehicle telematics time-series example for further explanation.
=# CREATE TABLE vehicle_basic_data_mars3(
daq_time timestamp ,
vin varchar(32) COLLATE "C" ,
lng float ,
lat float ,
speed float ,
license_template varchar(16) ,
flag integer
)
USING MARS3
WITH (compresstype=zstd, compresslevel=3,compress_threshold=1200,
mars3options='rowstore_size=64')
DISTRIBUTED BY (vin)
ORDER BY (vin, daq_time)
PARTITION BY RANGE (daq_time)
( START ('2022-07-01 00:00:00') INCLUSIVE
END ('2022-08-01 00:00:00') EXCLUSIVE
EVERY (INTERVAL '1 day')
,DEFAULT PARTITION OTHERS);
After successfully creating a MARS3 table, you may choose to create a mars3_brin index on the sort key. The mars3_brin index is a sparse index with minimal impact on disk space and insert performance. Combined with the ordered nature of the sort key, it enables optimal query efficiency on the sort key.
You can also create a mars3_brin index on non-sort-key columns. However, due to lack of ordering, query performance will be somewhat reduced. Add such indexes only as needed.
=# CREATE INDEX idx_mars3 ON vehicle_basic_data_mars3 USING mars3_brin(vin, daq_time);
Based on the parameters and statements in the example above, here are best practice guidelines:
Note!
We encourage you to follow our recommendations when designing tables, but do not blindly copy them. Time-series use cases vary widely; always analyze your specific scenario.
Choose the Sort Key
When designing a table, carefully select the sort key. The purpose of the sort key is to place data with similar time dimensions or attributes physically close together, reducing I/O seeks and improving query performance. Therefore, the sort key should align with your primary query patterns. For point queries on a single device, use the device identifier (e.g., vin). For detailed, aggregate, or multi-device queries over time ranges, use both the device ID and timestamp (e.g., (vin, daq_time)).
We recommend using the main entity identifier as the sort key. If a timestamp exists, include it after the identifier. In the example, the sort key is (vin, daq_time).
COLLATE "C"
Apply this option only to device ID fields; do not apply it to other fields. This improves sorting and query performance for text-type columns.
USING MARS3
This is a fixed syntax for MARS3 tables and must not be altered.
WITH (compresstype=zstd, compresslevel=3, mars3options='compress_threshold=1200, rowstore_size=64')
We recommend using zstd compression at level 3. This example uses general-purpose compression. For custom compression settings, see Using Compression.
In this table, the L0 Run size is 64MB and the compression threshold is 1200.
DISTRIBUTED BY (vin)
Use this clause to specify the distribution key.
The distribution key determines how data is distributed across segments and significantly impacts performance. Key considerations:
In the example, vin is used as the distribution key so that all data for a single device resides on the same segment, enabling efficient local computation and avoiding inter-node data redistribution.
PARTITION BY RANGE (daq_time)
Specify the partitioning key for partitioned tables.
We recommend using time as the partition key. This allows automatic data expiration via Automatic Partition Management and leverages partition pruning to filter out irrelevant time partitions.
Aim for partitions containing between one million and ten million rows. Too many small partitions consume excessive memory; too few reduce pruning effectiveness and complicate data lifecycle management.
In the example, daq_time is the partition key. Most queries filter by data collection time. For example, adding a condition like WHERE daq_time >= CURRENT_DATE - INTERVAL '1 day' allows the database to quickly identify the relevant partition and retrieve data efficiently.
( START ('2022-07-01 00:00:00') INCLUSIVE END ('2022-08-01 00:00:00') EXCLUSIVE EVERY (INTERVAL '1 days') ,DEFAULT PARTITION OTHERS);
This SQL statement indicates that subpartitions will be created from 2022-07-01 00:00:00 to just before 2022-08-01 00:00:00, using the START...END clause and INCLUSIVE、EXCLUSIVE keyword.
EVERY (INTERVAL '1 day')
Subpartition intervals are set to 1天. You may use hour, month, or year instead of day, depending on data volume. For instance:
1 day.1 month is appropriate.1 year is sufficient.DEFAULT PARTITION OTHERS
Defines the default partition. Data with timestamps not matching any defined range is stored here.
Use VACUUM/VACUUM FULL for Optimal Query Performance
Running VACUUM/VACUUM FULL converts unordered row-store runs into ordered column-store runs.
VACUUM FULL performs the same conversion and additionally merges multiple runs into one, achieving even better query performance. However, VACUUM FULL requires an exclusive lock on the table, so it cannot run concurrently with other operations on the same table.
When to run VACUUM/VACUUM FULL:
Enable Unique Mode for Batch Data Ingestion
When data from the same device arrives in batches at the same timestamp, MARS3 can merge duplicate entries (same daq_time and vin).
This feature must be explicitly enabled via uniquemode=true at table creation, as the default is false. For example:
uniquemode=true, three records from device 'A01' at '2022-01-01 00:00:00' will be merged, keeping only the last one.uniquemode=false (default), all three records are retained.Note!
To enable Unique Mode, the first column in theORDER BYclause must have aNOT NULLconstraint. MARS3 Unique Mode does not support deletion.
Example DDL with Unique Mode enabled:
=# CREATE TABLE vehicle_basic_data_mars3_um(
daq_time timestamp ,
vin varchar(32) COLLATE "C" NOT NULL,
lng float ,
lat float ,
speed float ,
license_template varchar(16) ,
flag integer
)
USING MARS3
WITH (compresstype=zstd, compresslevel=3,compress_threshold=1200,uniquemode=true,
mars3options='rowstore_size=64')
DISTRIBUTED BY (vin)
ORDER BY (vin, daq_time)
PARTITION BY RANGE (daq_time)
( START ('2022-07-01 00:00:00') INCLUSIVE
END ('2022-08-01 00:00:00') EXCLUSIVE
EVERY (INTERVAL '1 day')
,DEFAULT PARTITION OTHERS);
MARS2 tables depend on the matrixts extension. Install the extension before creating tables.
Note!
Thematrixtsextension is database-level. Install it once per database.
=# CREATE EXTENSION matrixts;
Use USING MARS2 to specify the MARS2 storage engine. Example:
=# CREATE TABLE mars2(
time timestamp with time zone,
tag_id int,
i4 int4,
i8 int8
)
USING MARS2;
Extended example for vehicle telematics:
=# CREATE TABLE vehicle_basic_data_mars2(
daq_time timestamp encoding (minmax),
vin varchar(32) COLLATE "C" encoding (minmax),
lng float encoding (minmax),
lat float encoding (minmax),
speed float ,
license_template varchar(16) ,
flag integer
)
USING MARS2
WITH (compresstype=zstd, compresslevel=3)
DISTRIBUTED BY (vin)
PARTITION BY RANGE (daq_time)
( START ('2022-07-01 00:00:00') INCLUSIVE
END ('2022-08-01 00:00:00') EXCLUSIVE
EVERY (INTERVAL '1 day')
,DEFAULT PARTITION OTHERS);
After creating a MARS2 table, you must create a mars2_btree index to enable normal read and write operations.
The index enforces physical data ordering, placing similar or related data close together to reduce I/O and improve query speed. Therefore, the sort key should reflect common query patterns. For single-device queries, use the device ID (vin). For time-range or multi-device queries, use (vin, daq_time).
=# CREATE INDEX idx_mars2 ON vehicle_basic_data_mars2
USING mars2_btree(vin, daq_time);
Best practices based on the example:
Note!
Follow our guidance, but avoid blind replication. Analyze your specific use case.
Choose the Sort Key
Select the sort key during table design. We recommend using the primary entity identifier. If a timestamp is available, include it after the identifier. In the example, the sort key is (vin, daq_time).
Since the sort key is part of the index, queries on it can be efficiently located without full table scans. This optimizes queries starting from the entity, ensuring related data is stored contiguously, reducing I/O and speeding up access.
encoding (minmax)
This option enhances computational efficiency.
minmax encoding to all sort key columns. Since these are ordered, minmax filters are effective during full scans.minmax for columns involved in min, max, avg, sum, count, or WHERE condition filtering.Apply this judiciously. Overuse increases CPU and I/O overhead during data ingestion.
COLLATE "C"
Same as in the MARS3 example.
USING MARS2
Fixed syntax for MARS2 tables; do not modify.
WITH (compresstype=zstd, compresslevel=3)
Use zstd compression at level 3. See Using Compression for custom configurations.
DISTRIBUTED BY (vin)
Same as in the MARS3 example.
PARTITION BY RANGE (daq_time)
Same as in the MARS3 example.
( START ('2022-07-01 00:00:00') INCLUSIVE END ('2022-08-01 00:00:00') EXCLUSIVE EVERY (INTERVAL '1 days'),default partition others);
Same as in the MARS3 example.
EVERY (INTERVAL '1 day')
Same as in the MARS3 example.
Enable Unique Mode for Batch Data
MARS2 supports merging duplicate records (same daq_time and vin) when data arrives in batches.
Enable this via uniquemode=true when creating the index, as the default is false. For example:
uniquemode=true, three records from device 'A01' at '2022-01-01 00:00:00' are merged into one (last record wins).uniquemode=false (default), all three are kept.Example index DDL with Unique Mode:
=# CREATE INDEX idx_mars2_um ON vehicle_basic_data_mars2_um
USING mars2_btree(vin, daq_time) WITH(uniquemode = true);
Note!
A MARS2 table supports only one index, which cannot be dropped after creation.
HEAP is the default storage engine in YMatrix. If no storage engine is specified, the table is created as a HEAP table.
=# CREATE TABLE disk_heap(
time timestamp with time zone,
tag_id int,
read float,
write float
)
DISTRIBUTED BY (tag_id);
AORO tables are row-oriented AO tables. Row orientation is the default for AO tables.
AORO supports table-level compression only, not column-level compression.
=# CREATE TABLE disk_aoro(
time timestamp with time zone,
tag_id int,
read float,
write float
)
WITH(appendonly=true,orientation=row,compresstype=zlib,compresslevel=4)
DISTRIBUTED BY (tag_id);
AOCO tables are column-oriented AO tables.
AOCO supports both table-level and column-level compression.
=# CREATE TABLE disk_aoco(
time timestamp with time zone,
tag_id int ENCODING(compresstype=zlib,compresslevel=5),
read float,
write float
)
WITH(appendonly=true,orientation=column,compresstype=zlib,compresslevel=6)
DISTRIBUTED BY (tag_id);
Best practices for selecting between HEAP and AO: