This document introduces the storage engine and storage diagnostic tools supported by MatrixDB, mainly including the following contents:
The storage engine is the storage base of the database system. The database uses the storage engine to create, query, update and delete data. Depending on your needs, different storage engines will provide different storage mechanisms and design in different dimensions such as physical layout, index type, and lock granularity.
Currently, the MatrixDB database system supports the use of the following storage engines: HEAP based on PostgreSQL, MARS2 developed by MatrixDB, and column storage engine AOCO and row storage engine AORO.
HEAP tables are storage engines provided by traditional PostgreSQL databases, also known as heap tables. This type of table supports a large number of concurrent read and write, transaction, index and other features.
The MARS2 table reduces the number of I/O addressing times with its physically ordered merging method, thereby improving the query performance of data in the table. At the same time, MARS2 supports compression, column storage, automatic archiving, pre-aggregation and other functions, which performs superiorly in timing scenarios.
MARS2 does not currently support updates and deletion.
Depending on the different characteristics of the two storage engines, you can flexibly create different tables under different scenario requirements. We give the following example.
The HEAP table is the default storage engine of MatrixDB, so if you do not specify the storage engine when creating the table, the created HEAP tables are all HEAP tables.
=# CREATE TABLE disk(
time timestamp with time zone,
tag_id int,
read float,
write float
)
DISTRIBUTED BY (tag_id);
MARS2 tables depend on matrixts
timing extension. Before building a table, you first need to create an extension in the database using the storage engine.
Notes!
matrixts
is extended to the database level, and it can be created once in a database without repeated creation.=# CREATE EXTENSION matrixts;
When creating tables, use
USING MARS2
to specify the storage engine:=# CREATE TABLE vehicle_basic_data_mars2( daq_time timestamp encoding (minmax), vin varchar(32) COLLATE "C" encoding (minmax), lng float encoding (minmax), lat float encoding (minmax), speed float , license_template varchar(16) , flag integer ) USING MARS2 WITH (compresstype=zstd, compresslevel=3) DISTRIBUTED BY (vin) PARTITION BY RANGE (daq_time) ( START ('2022-07-01 00:00:00') INCLUSIVE END ('2022-08-01 00:00:00') EXCLUSIVE EVERY (INTERVAL '1 day') ,DEFAULT PARTITION others );
We will explain the specific parameters and statements in the
vehicle_basic_data_mars2
table:
- encoding (minmax) This option is used to improve computing power. If you want to perform aggregated queries such as
min
,max
,avg
,sum
,count
based on this field, or use theWHERE
conditional filtering option. The addition of this option needs to be determined based on the business scenario. Do not add it to all fields without thinking, otherwise this operation will bring a lot of unreasonable data writing to the CPU and disk I/O consumption.- COLLATE "C" This option is only added in the Device Encoding field, and no other fields need to be added. With this operation, you can improve the sorting and querying speed of text types.
- USING MARS2 It is a fixed writing method using the MARS2 table and cannot change the content.
- WITH (compresstype=zstd, compresslevel=3) Specify the compression algorithm as
zstd
, as well asrle_type
,zlib
,lz4
compression algorithms.zstd
is recommended in this table, and the compression level is3
level. For detailed compression performance parameters, please refer to the table below.- DISTRIBUTED BY (vin): Use this statement to select the distribution key. We recommend using the
Device Encoding Field
as the distribution key, so that the data of the same device can be queried and related calculations can be performed. This avoids performance losses caused by data redistribution between nodes.- PARTITION BY RANGE (daq_time) Specifies the partition key for the partition table. We recommend using the
Time of data acquisition by the device
as the partition key. Usually a large number of queries are the time to filter data collection. For example, when you want to query the data within a day and perform related calculations, you need to add the filtering condition `WHERE daq_time >= CURRENT_DATE - INTERVAL '1 day', so that the database will quickly determine which partition subtable the data is on, so as to quickly locate the data and query the table.- ( START ('2022-07-01 00:00:00') INCLUSIVE END ('2022-08-01 00:00:00') EXCLUSIVE EVERY (INTERVAL '1 days') ,DEFAULT PARTITION others); This SQL statement indicates that you will use the
START...END
statement and theINCLUSIVE
andEXCLUSIVE
keywords to create a subpartition table that starts from2022-07-01 00:00:00
to2022-08-01 00:00:00
.- EVERY (INTERVAL '1 day') The time span of the subpartition table is
1
day. In addition to usingday
, you can also usehour
,month
,year
, etc., and determine it based on the size of the data volume. For example, in a day, your server receives up to 1 million to 10 million pieces of data, and the1 day
time interval is the best choice. If the data volume of a day is only tens of thousands or hundreds of thousands, then choosing1 month
is very suitable; if the data volume per day is only tens of thousands, then the1 year
sub-partition table can be generated once.- DEFAULT PARTITION others Indicates the default partition. If the corresponding subtable storage of data cannot be found in other partitions, it will be stored in this partition.
Notes!
We hope you will design ideas based on our suggestions, but we don’t want you to blindly apply them. The specific timing scenarios are ever-changing, and specific analysis of specific situations is still very necessary.
The MARS2 table compression performance related parameters mentioned above:
Parameter name | Default value | Min value | Maximum value | Description |
---|---|---|---|---|
compress_threshold | 1200 | 1 | 8000 | Compression threshold. Used to control how many tuples (tuples) are compressed in a single table, and is the upper limit of the number of Tuples compressed in the same unit. |
compressiontype | lz4 | -- | -- | Compression algorithm, supported: 1. zstd 2. zlib 3. lz4 |
compresslevel | 1 | 1 | -- | Compression level. Usually, the smaller the value, the smaller the compression rate, but the faster the compression; the larger the value, the higher the compression rate, but the slower the compression. Different algorithms have different valid values ranges: zstd: 1-19 zlib: 1-9 lz4: 1-20 |
Notes!
Generally speaking, the higher the compression level, the higher the compression rate and the lower the speed. But that's not absolute.
After successfully creating the MARS2 table, you must create an additional index of type mars2_btree
to perform normal data reading and writing. The purpose of using index sorting is to make data of the same dimension or similar characteristics physically close as possible to reduce the number of I/O addressing and improve query efficiency. Therefore, the selection of sorting keys needs to comply with the main business query characteristics.
For example, the requirement is a single-device point query, then the sort key is the device number (vin)
in the timing scenario. If the requirement is a detailed query, aggregation query or multi-device query within a certain time period, then the sort key is the device number and the timestamp (vin,daq_time)
.
=# CREATE INDEX idx_mars2 ON vehicle_basic_data_mars2
USING mars2_btree(vin, daq_time);
Notes!
When data from the same time point in the device is reported in batches, MARS2 can merge data from the same device (the value ofdaq_time
in this table) at the same time (the value ofvin
in this table). The merge feature requires thatuniquemode=true
be manually specified when creating an index, because the default value of this option isfalse
. For example, when you specifyuniquemode=true
, the deviceA01
passes back3
data at2022-01-01 00:00:00
, and finally the original two data will be overwritten according to the last data, only one data will be retained; but if you defaultuniquemode=false
, then the3
data sent back by the device2022-01-01 00:00:00
will be retained in the end without any processing.
datainspect is a storage diagnostic tool built into MARS2, which provides underlying data insights to accurately optimize the storage and query performance of your data.
Storage optimization requires paying attention to the actual content of the data in the physical file. Similar to PostgreSQL's Pageinspect, you can use datainspect to easily extract data segments in MARS2 physical storage for further analysis. In addition, datainspect integrates relevant index and meta information in MARS2 to provide basic information such as NULL distribution and minmax, thereby aiding in the optimization of the I/O scanning process.
Notes!
This tool is only available for MARS2 tables.
datainspect is a system function built in MARS2 and can be used after MARS2 is installed correctly. MARS2 tables depend on matrixts
timing extension. Before building a table, you first need to create an extension in the database using the storage engine.
Notes!
matrixts
is extended to the database level, and it can be created once in a database without repeated creation.=# CREATE EXTENSION matrixts;
2.2 Introduction to function functions
First, create a test table named
tb1
in the example. We will use this test table as a use case to introduce three related functions.=# CREATE TABLE tb1( f1 int8 encoding(minmax), f2 int8 encoding(minmax), f3 float8 encoding(minmax), f4 text ) USING MARS2;
Create a MARS2 index.
=# CREATE INDEX ON tb1 USING mars2_btree(f1);
Construct 24,000 test data.
=# INSERT INTO tb1 SELECT generate_series(1, 24000), mod((random()*1000000*(generate_series(1, 1200)))::int8, (random()::int8/100 + 100)), (random() * generate_series(1, 24000))::float8, (random() * generate_series(1, 24000))::text;
2.2.1 desc_ranges
The desc_ranges function integrates the underlying meta information and index of MARS2 to provide such as minmax index information and null value information, and also supports accurate monitoring of the underlying storage space occupancy of data.
grammar
SELECT <* / column1,column2,...> FROM matrixts_internal.desc_ranges(<tablename TEXT>)
Parameters
tablename: table name, the partition table name when partitioning the table. (Required)
return
Fields | Description |
---|---|
segno | Segment number, starting with 0 |
attno | Attribute Number, column number, starting with 0 |
forkno | Number of physical file shards. It is the underlying number of the database, which can be considered to correspond to a specific file. By default, it starts from the first fork |
offno | RANGE Offset in physical file, unit bytes. The default starts from 0 . In MARS2, data is stored in batches and a storage RANGE is formed in units of set compress_threshold (compression threshold parameter), forming its own position identification with offsets in the physical file. |
nbytes | RANGE The actual space occupied in physical files, unit bytes |
nrows | compress_threshold set when creating tables, default is 1200 . compress_threshold is the compression threshold. Used to control how many tuples (tuples) are compressed in a single table, and is the upper limit of the number of Tuples compressed in the same unit. |
nrowsnotnull | RANGE Non-null number of bars |
mmin | If this column supports minmax index, it displays the minimum value in the RANGE, and if it does not exist, it is NULL |
mmax | If this column supports minmax index, it is the maximum value in the RANGE. If it does not exist, it is NULL. |
=# SELECT attno, sum(nbytes)/1024 as "Size in KB"
FROM matrixts_internal.desc_ranges('tb1') GROUP BY attno ORDER BY attno;
attno | Size in KB
-------+--------------------------------------------------------------------------------------------------------------------------
0 | 94.9062500000000000
1 | 7.8203125000000000
2 | 187.3437500000000000
3 | 386.3515625000000000
(4 rows)
=# SELECT * FROM matrixts_internal.desc_ranges('tb1') WHERE segno = 1;
segno | attno | forkname | forkno | offno | nbytes | nrows | nrowsnotnull | mmin | mmax
-------+---------+-------+-------+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | 0 | data1 | 304 | 0 | 4848 | 1200 | 1200 | 15 | 7240
1 | 1 | data1 | 304 | 16376 | 856 | 1200 | 199 | 0 | 99
1 | 2 | data1 | 304 | 17712 | 9072 | 1200 | 1200 | 1.4602231817218758 | 704.8010557110921
1 | 3 | data1 | 304 | 50024 | 20272 | 1200 | 1200 | NULL | NULL
1 | 0 | data1 | 304 | 4848 | 4856 | 1200 | 1200 | 7243 | 14103
1 | 1 | data1 | 304 | 17232 | 160 | 1200 | 0 | NULL | NULL
1 | 2 | data1 | 304 | 26784 | 9760 | 1200 | 1200 | 705.0931003474365 | 1372.9018354549075
1 | 3 | data1 | 304 | 70296 | 19680 | 1200 | 1200 | NULL | NULL
1 | 0 | data1 | 304 | 9704 | 4856 | 1200 | 1200 | 14125 | 21417
1 | 1 | data1 | 304 | 17392 | 160 | 1200 | 0 | NULL | NULL
1 | 2 | data1 | 304 | 36544 | 9760 | 1200 | 1200 | 1375.043496121433 | 2084.906658862494
1 | 3 | data1 | 304 | 89976 | 19792 | 1200 | 1200 | NULL | NULL
1 | 0 | data1 | 304 | 14560 | 1816 | 445 | 445 | 21429 | 23997
1 | 1 | data1 | 304 | 17552 | 160 | 445 | 0 | NULL | NULL
1 | 2 | data1 | 304 | 46304 | 3720 | 445 | 445 | 2086.0748374078717 | 2336.065046118657
1 | 3 | data1 | 304 | 109768 | 7576 | 445 | 445 | NULL | NULL
(16 rows)
The show_range
function selects a piece of physical data in MARS2 and displays this piece of data into readable data. The currently supported data types are: int2, int4, int8, float4, float8, timestamp, date, text.
SELECT <* / column1,column2,...> FROM matrixts_internal.show_range(
tablename text,
attno int4,
forkno int4,
offno int4,
nbytes int4
)
0
in order. (Required)Notes!
For details of the above parameters, please refer to the return introduction of thedesc_ranges
function above.
Fields | Description |
---|---|
rowno | row number. This number is determined by the relative offset in RANGE, rather than the absolute offset of the entire table |
val | Actual content |
Notes!
There may be errors in theval
displayed on floating point data.
=# SELECT * FROM matrixts_internal.show_range('tb1', 1, 304, 16176, 808) LIMIT 20;
rowno | val
-------+--------
1 | 4
2 | 36
3 | 81
4 | 58
5 | 17
6 | 75
7 | 11
8 | 84
9 | 60
10 | 78
11 | 69
12 | 0
13 | 87
14 | 40
15 | 72
16 | 58
17 | 17
18 | 48
19 | 70
20 | 6
(20 rows)
The dump_range
function decompresses the physical data selected in MARS2 into a binary file for secondary analysis.
=# SELECT <* / column1,column2,...> FROM matrixts_internal.dump_ranges(
tablename text,
attno int4,
outfile text,
forkno int4,
offno int4,
limits int4
);
tb1-f2.bin
, and after export, it appears as tb1-f2.bin-seg1
on Segment1. (Required)forkno
and offno
as the starting position. (Optional)Note! Here
limits
limits only the number of exported RANGEs on a single segment.
Notes!
For details of the above parameters, please refer to the return introduction of thedesc_ranges
function above.
return
Example
=# SELECT * FROM matrixts_internal.dump_ranges('tb1', 1, '/data/demo/tb1-f2.bin', 304, 16176, 1) LIMIT 20;
nbytes
--------
0
0
1480
1632
1704
1592
(6 rows)
In the example, after executing dump_ranges
, it was found that 6 results were returned. This is because the export is performed independently by the Segment. Each of them represents the export results on a downstream Segment, which will appear in the corresponding directory. Among them, nbytes
of 2 results is 0
because there is no data on the Segment that meets the filtering conditions.
After execution, each Segment host will generate a binary file with the original suffix and the independent suffix identifier .-seg<no>
.
