Storage engine and storage solution

MatrixDB supports a variety of storage engines and storage solutions for users to choose from.

1. Storage Engine

1.1 Introduction to Storage Engine

MatrixDB supports the following storage engines:

Type Description
Heap Heap table is a storage type provided by traditional PostgreSQL databases, also known as heap tables.
This type of table supports a large number of concurrent read and write, transaction, index and other features.
Mars1 Mars1 table is a self-developed storage type of MatrixDB.
Support compression, column storage, and pre-aggregation, and has extremely high performance in analysis scenarios.
But the data is required to be inserted in order, and update and deletion are not supported yet.
The Mars2 Mars2 table is a storage engine developed based on Mars1.
Support compression, column storage, and automatic archiving, without manual heat to cold.
The timing writing performance is better, and the upsert scenario optimization is the most obvious.

1.2 Create a table

The above introduces the characteristics of several storage engines. Now let’s introduce how to create tables for various storage engines:

1.2.1 Create a Heap table

The Heap table is the default storage engine of MatrixDB, so when there is no special storage engine specified when creating the table, the Heap tables are created:

CREATE TABLE disk(
    time timestamp with time zone,
    tag_id int,
    read float,
    write float
)
Distributed by (tag_id);

1.2.2 Create Mars1 table

The Mars1 table depends on matrixts and mars extensions, so before creating a table, you must first create an extension in a library using the storage engine:

CREATE EXTENSION matrixts;
CREATE EXTENSION mars;

When creating tables, use USING Mars to specify the storage engine:

CREATE TABLE disk_mars(
    time timestamp with time zone,
    tag_id int,
    read float,
    write float
)
USING Mars
WITH (tagkey="tag_id", timekey="time", timebucket="8 hours")
Distributed by (tag_id);

The meta information of the Mars table is specified in the WITH keyword, including:

  1. tag_id: device ID field
  2. timekey: timestamp field
  3. timebucket: the time span of each row group (Row Group)

Mars table supports columnar compression. Compression can not only reduce storage, but also reduce query costs.

The Mars table supports 4 compression types:

  1. rle_type
  2. zstd
  3. zlib
  4. lz4

When creating a Mars table, specify the compression type of the column through compressiontype. At the same time, you must also specify the compression level (0-9) through compresslevel. The higher the level, the higher the compression rate, the smaller the space, but the consumption of computing performance is also greater.

The Mars table created by SQL as follows, specifies different compression types for different columns:

CREATE TABLE disk_mars (
    time timestamp with time zone ENCODING (compresstype=rle_type, compresslevel=1),
    tag_id int ENCODING (compresstype=zstd, compresslevel=1),
    read float ENCODING (compresstype=zlib, compresslevel=1),
    write float ENCODING (compresstype=lz4, compresslevel=1)
)
USING Mars
WITH (tagkey="tag_id", timekey="time", timebucket="8 hours")
Distributed by (tag_id);

1.2.3 Create Mars2 table

The Mars2 table depends on the matrixts extension, so before creating a table, you must first create an extension in the library using the storage engine:

CREATE EXTENSION matrixts;

When creating tables, use USING mars2 to specify the storage engine:

CREATE TABLE metrics(
    time timestamp,
    tag_id int,
    sensor float4
)
USING mars2
Distributed by(tag_id);

The Mars2 table contains the following optional parameters:

Parameter name Default value Min value Maximum value Description
compress_threshold 1200 1 8000 Up limit of tuple compression for the same group
compressiontype lz4 -- -- Compression algorithm, supported:
1. zstd
2. zlib
3. lz4
compresslevel 1 1 -- Compression level. The smaller the value, the faster the compression, but the worse the compression effect; the larger the value, the slower the compression, but the better the compression effect. Different algorithms have different valid values ​​ranges:
zstd: 1-19
zlib: 1-9
lz4: 1-20

As above parameters, you can add WITH keyword when creating tables, such as:

CREATE TABLE metrics(
    time timestamp,
    tag_id int,
    sensor float4
) using mars2
WITH(compress_threshold=1200, compresstype='lz4', compresslevel=1)
Distributed by(tag_id);

After the Mars2 table is created, an additional index of type mars2_btree must be created so that normal data reading and writing can be performed. Indexing has several main functions:

  1. The function of the index itself is used for quick search.
  2. Specify the sort key for dynamic sorting of Mars2 internally.
    CREATE INDEX idx_metrics ON metrics USING mars2_btree(tag_id, time);

    When data from the same time point in the device is reported in batches, Mars2 can merge data from the same device at the same time. The merge feature requires adding the uniquemode=true option when creating an index:

    CREATE INDEX idx_metrics ON metrics USING mars2_btree(tag_id, time) WITH (uniquemode=true);

    2. Storage Solution

    The above introduces several storage engines supported by MatrixDB in timing scenarios. The following introduces how to use different storage engines under different storage solutions.

Currently, MatrixDB supports two storage solutions:

  1. Heap+Mars1
  2. Mars2

2.1 Heap+Mars1

In this storage solution, hot and cold data are stored using different storage engines.

  1. Heap tables to store hot data, because hot data may need to be updated, which can give full play to the advantages of Heap's high update performance.
  2. The Mars1 table stores historical cold data. The cold data will not change. It takes up less space after using Mars1 compression, and has higher aggregate query performance.
  3. It is necessary to explicitly perform hot to cold operation, and the overhead of the conversion process cannot be ignored.

Applicable scenarios: There are many gathering queries, and the business has a low peak period for heating to cooling

For the specific usage of this solution, please refer to Heap+Mars1 storage solution practice

2.2 Mars2

The Mars2 solution uses only one storage engine. Compared with Heap+Mars1, Mars2 will automatically complete cold data compression, eliminating the process of heat to cold.

Applicable scenarios: The business is not used for the low peak period when it is heat-to-cooling, and there are many upsert operations.