Storage engine selection and conversion

1. Timing Storage Engine

In the timing scenario, MatrixDB has two common types of tables:

Type	Description
Heap	Heap table is a storage type provided by traditional PostgreSQL databases, also known as heap tables. This type of table supports a large number of concurrent read and write, transaction, index and other features.
The Mars	Mars table is a self-developed storage type of MatrixDB. Support compression, column storage, and pre-aggregation, and has extremely high performance in analysis scenarios. But the data is required to be inserted in order, and update and deletion are not supported yet.

Therefore, in order to maximize the performance of time series data writing and query analysis, two types of tables are required to be used in a production environment. When writing data, write to the Heap table first because:

Heap concurrent write performance
When accessing time series data may be reported in an out-of-order, delayed reporting, and batch reporting of the same equipment in a smaller time window. It cannot be guaranteed to increase by time and update operations are required.

2. Partition management and conversion of storage engines

In MatrixDB, partitioning technology is used to match two table types and partition data according to timestamps. The hot data is first written to the Heap type hot partition. After confirming that the accessed data will not be modified and earlier data will not be reported, the Heap hot partition will be converted to a Mars cold partition.

The partition time span varies according to the amount of data. For example: if the data volume is large, the partition size should be small, and it can be partitioned by day or hour; if the data volume is small, the partition size can be increased and partitioned by week or month.

MatrixDB provides UDFs for partition management, and it is necessary to first establish related extension components:

CREATE EXTENSION matrixts;
CREATE EXTENSION mars;

Taking the recording of disk read and write volume as an example, the usage of hot and cold storage is as follows:

2.1 Create a timing table

CREATE TABLE disk(
    time timestamp with time zone,
    tag_id int,
    read float,
    write float
)
Distributed by (tag_id)
Partition by range(time);

Through the above SQL statement, a disk read and write timing table partitioned by timestamps is established, and the distribution key is tag_id.

2.2 Create a partition template

After creating the timing table, the partition template is created by calling the build_timeseries_table function provided by the mars extension:

SELECT mars.build_timeseries_table('disk','tagkey="tag_id", timekey="time", timebucket="1 day"');

In the process of creating a partition template, you need to provide the column names of the tag keys and timestamps, as well as time units.

The partition template table will not be automatically deleted when the original DROP table is used and needs to be cleaned manually. If you want to delete the original table, the correct operation is as follows:
SELECT mars.destroy_timeseries_table('disk');
DROP TABLE disk;

2.3 Create a partition

Create a partition template, and call the add_partition function provided by the mars extension to create a partition:

SELECT mars.add_partition('disk', '2021-04-01', '2021-05-01','1 day');

As mentioned above, SQL has established a partition from 2021-04-01 to 2021-05-01, with the time unit being 1 day.

Create a partition and call the list_partition function to view partition information:

ymatrix=# SELECT * from mars.list_partition('disk');
                    relname                    | storage
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 heap_disk_20210401000000000_20210402000000000 | heap
 heap_disk_20210402000000000_20210403000000000 | heap
 heap_disk_20210403000000000_20210404000000000 | heap
 heap_disk_20210404000000000_20210405000000000 | heap
 heap_disk_20210405000000000_20210406000000000 | heap
 heap_disk_20210406000000000_20210407000000000 | heap
 heap_disk_20210407000000000_20210408000000000 | heap
 heap_disk_20210408000000000_20210409000000000 | heap
 heap_disk_20210409000000000_20210410000000000 | heap
 heap_disk_20210410000000000_20210411000000000 | heap
 heap_disk_20210411000000000_20210412000000000 | heap
 heap_disk_20210412000000000_20210413000000000 | heap
 heap_disk_20210413000000000_20210414000000000 | heap
 heap_disk_20210414000000000_20210415000000000 | heap
 heap_disk_20210415000000000_20210416000000000 | heap
 heap_disk_20210416000000000_20210417000000000 | heap
 heap_disk_20210417000000000_20210418000000000 | heap
 heap_disk_20210418000000000_20210419000000000 | heap
 heap_disk_20210419000000000_20210420000000000 | heap
 heap_disk_20210420000000000_20210421000000000 | heap
 heap_disk_20210421000000000_20210422000000000 | heap
 heap_disk_20210422000000000_20210423000000000 | heap
 heap_disk_20210423000000000_20210424000000000 | heap
 heap_disk_20210424000000000_20210425000000000 | heap
 heap_disk_20210425000000000_20210426000000000 | heap
 heap_disk_20210426000000000_20210427000000000 | heap
 heap_disk_20210427000000000_20210428000000000 | heap
 heap_disk_20210428000000000_20210429000000000 | heap
 heap_disk_20210429000000000_20210430000000000 | heap
 heap_disk_20210430000000000_20210501000000000 | heap
(30 rows)

As shown in the above SQL, all partition subtables contained in the disk table, including subtable names and storage types.

2.4 Time series data writing table

insert into disk values(now(), ...., ..., ...);

Because the table is partitioned by time, just insert data directly into the main table, and the data will automatically fall on the corresponding partition.

2.5 Partition Replacement

After the time series data collection on April 1 is collected, the time series data on April 1 will not be connected. The data on April 1 must be converted from a hot partition to a cold partition.

The Mars extension provides a compress_partition function, which is used to perform partition conversion:

ymatrix=# select mars.compress_partition('heap_disk_20210401000000000_20210402000000000');
              compress_partition
----------------------------------------------------------------------------------------------------------------------------------
 mars_disk_20210401000000000_20210402000000000
(1 row)

As mentioned above, SQL converts the corresponding partition on April 1st. After the conversion, call the list_partition function to see the converted partition table name and storage type:

ymatrix=# SELECT * from mars.list_partition('disk');
                    relname                    | storage
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 heap_disk_20210402000000000_20210403000000000 | heap
 heap_disk_20210403000000000_20210404000000000 | heap
 heap_disk_20210404000000000_20210405000000000 | heap
 heap_disk_20210405000000000_20210406000000000 | heap
 heap_disk_20210406000000000_20210407000000000 | heap
 heap_disk_20210407000000000_20210408000000000 | heap
 heap_disk_20210408000000000_20210409000000000 | heap
 heap_disk_20210409000000000_20210410000000000 | heap
 heap_disk_20210410000000000_20210411000000000 | heap
 heap_disk_20210411000000000_20210412000000000 | heap
 heap_disk_20210412000000000_20210413000000000 | heap
 heap_disk_20210413000000000_20210414000000000 | heap
 heap_disk_20210414000000000_20210415000000000 | heap
 heap_disk_20210415000000000_20210416000000000 | heap
 heap_disk_20210416000000000_20210417000000000 | heap
 heap_disk_20210417000000000_20210418000000000 | heap
 heap_disk_20210418000000000_20210419000000000 | heap
 heap_disk_20210419000000000_20210420000000000 | heap
 heap_disk_20210420000000000_20210421000000000 | heap
 heap_disk_20210421000000000_20210422000000000 | heap
 heap_disk_20210422000000000_20210423000000000 | heap
 heap_disk_20210423000000000_20210424000000000 | heap
 heap_disk_20210424000000000_20210425000000000 | heap
 heap_disk_20210425000000000_20210426000000000 | heap
 heap_disk_20210426000000000_20210427000000000 | heap
 heap_disk_20210427000000000_20210428000000000 | heap
 heap_disk_20210428000000000_20210429000000000 | heap
 heap_disk_20210429000000000_20210430000000000 | heap
 heap_disk_20210430000000000_20210501000000000 | heap
 mars_disk_20210401000000000_20210402000000000 | mars
(30 rows)

As you can see, the converted partition name prefix has changed from heap to mars, and the storage type has also changed accordingly. Old partitions will be deleted automatically.

3. Use the Mars engine directly

The Mars engine can not only be used in partition tables, but can also be created and used separately.

When users have a batch of static data that needs to use the high compression and high performance advantages of the Mars engine to optimize queries, they can manually create Mars tables and sort the data and import them.

Create an extension first:

CREATE EXTENSION matrixts;
CREATE EXTENSION mars;

Create a mars table again. When creating a table, use USING Mars to specify the storage engine:

CREATE TABLE disk_mars(
    time timestamp with time zone,
    tag_id int,
    read float,
    write float
)
USING Mars
WITH (tagkey="tag_id", timekey="time", timebucket="8 hours")
Distributed by (tag_id);

The meta information of the Mars table is specified in the WITH keyword, including:

tag_id: device ID field
timekey: timestamp field
timebucket: the time span of each row group (Row Group)

3.1 Compressed Storage

Mars table supports columnar compression. Compression can not only reduce storage, but also reduce query costs.

The Mars table supports 4 compression types:

rle_type
zstd
zlib
lz4

When creating a Mars table, specify the compression type of the column through compressiontype. At the same time, you must also specify the compression level (0-9) through compresslevel. The higher the level, the higher the compression rate, the smaller the space, but the consumption of computing performance is also greater.

The Mars table created by SQL as follows, specifies different compression types for different columns:

CREATE TABLE disk_mars (
    time timestamp with time zone ENCODING (compresstype=rle_type, compresslevel=1),
    tag_id int ENCODING (compresstype=zstd, compresslevel=1),
    read float ENCODING (compresstype=zlib, compresslevel=1),
    write float ENCODING (compresstype=lz4, compresslevel=1)
)
USING Mars
WITH (tagkey="tag_id", timekey="time", timebucket="8 hours")
Distributed by (tag_id);

3.2 Importing data to Mars table

The Mars table requires the data to be sorted by tag_id and timestamp before pouring, otherwise it will cause the query result error.

Import the heap table data into the Mars table:

INSERT INTO disk_mars SELECT * FROM disk_heap ORDER BY tag_id, time;

You can also import external table data into the Mars table:

INSERT INTO disk_mars SELECT * FROM ext ORDER BY tag_id, time;

简体中文