This document introduces the storage engine and storage diagnostic tools supported by MatrixDB, mainly including the following contents:
The storage engine is the storage base of the database system. The database uses the storage engine to create, query, update and delete data. Depending on your needs, different storage engines will provide different storage mechanisms and design in different dimensions such as physical layout, index type, and lock granularity.
Currently, the MatrixDB database system supports the use of the following storage engines: HEAP based on PostgreSQL, MARS2 developed by MatrixDB, and column storage engine AOCO and row storage engine AORO.
HEAP tables are storage engines provided by traditional PostgreSQL databases, also known as heap tables. This type of table supports a large number of concurrent read and write, transaction, index and other features.
The MARS2 table reduces the number of I/O addressing times with its physically ordered merging method, thereby improving the query performance of data in the table. At the same time, MARS2 supports compression, column storage, automatic archiving, pre-aggregation and other functions, which performs superiorly in timing scenarios.
MARS2 does not currently support updates and deletion.
Depending on the different characteristics of the two storage engines, you can flexibly create different tables under different scenario requirements. We give the following example.
The HEAP table is the default storage engine of MatrixDB, so if you do not specify the storage engine when creating the table, the created HEAP tables are all HEAP tables.
=# CREATE TABLE disk(
time timestamp with time zone,
tag_id int,
read float,
write float
)
DISTRIBUTED BY (tag_id);
MARS2 tables depend on matrixts timing extension. Before building a table, you first need to create an extension in the database using the storage engine.
Notes!
matrixtsis extended to the database level, and it can be created once in a database without repeated creation.=# CREATE EXTENSION matrixts;When creating tables, use
USING MARS2to specify the storage engine:=# CREATE TABLE vehicle_basic_data_mars2( daq_time timestamp encoding (minmax), vin varchar(32) COLLATE "C" encoding (minmax), lng float encoding (minmax), lat float encoding (minmax), speed float , license_template varchar(16) , flag integer ) USING MARS2 WITH (compresstype=zstd, compresslevel=3) DISTRIBUTED BY (vin) PARTITION BY RANGE (daq_time) ( START ('2022-07-01 00:00:00') INCLUSIVE END ('2022-08-01 00:00:00') EXCLUSIVE EVERY (INTERVAL '1 day') ,DEFAULT PARTITION others );We will explain the specific parameters and statements in the
vehicle_basic_data_mars2table:
- encoding (minmax) This option is used to improve computing power. If you want to perform aggregated queries such as
min,max,avg,sum,countbased on this field, or use theWHEREconditional filtering option. The addition of this option needs to be determined based on the business scenario. Do not add it to all fields without thinking, otherwise this operation will bring a lot of unreasonable data writing to the CPU and disk I/O consumption.- COLLATE "C" This option is only added in the Device Encoding field, and no other fields need to be added. With this operation, you can improve the sorting and querying speed of text types.
- USING MARS2 It is a fixed writing method using the MARS2 table and cannot change the content.
- WITH (compresstype=zstd, compresslevel=3) Specify the compression algorithm as
zstd, as well asrle_type,zlib,lz4compression algorithms.zstdis recommended in this table, and the compression level is3level. For detailed compression performance parameters, please refer to the table below.- DISTRIBUTED BY (vin): Use this statement to select the distribution key. We recommend using the
Device Encoding Fieldas the distribution key, so that the data of the same device can be queried and related calculations can be performed. This avoids performance losses caused by data redistribution between nodes.- PARTITION BY RANGE (daq_time) Specifies the partition key for the partition table. We recommend using the
Time of data acquisition by the deviceas the partition key. Usually a large number of queries are the time to filter data collection. For example, when you want to query the data within a day and perform related calculations, you need to add the filtering condition `WHERE daq_time >= CURRENT_DATE - INTERVAL '1 day', so that the database will quickly determine which partition subtable the data is on, so as to quickly locate the data and query the table.- ( START ('2022-07-01 00:00:00') INCLUSIVE END ('2022-08-01 00:00:00') EXCLUSIVE EVERY (INTERVAL '1 days') ,DEFAULT PARTITION others); This SQL statement indicates that you will use the
START...ENDstatement and theINCLUSIVEandEXCLUSIVEkeywords to create a subpartition table that starts from2022-07-01 00:00:00to2022-08-01 00:00:00.- EVERY (INTERVAL '1 day') The time span of the subpartition table is
1day. In addition to usingday, you can also usehour,month,year, etc., and determine it based on the size of the data volume. For example, in a day, your server receives up to 1 million to 10 million pieces of data, and the1 daytime interval is the best choice. If the data volume of a day is only tens of thousands or hundreds of thousands, then choosing1 monthis very suitable; if the data volume per day is only tens of thousands, then the1 yearsub-partition table can be generated once.- DEFAULT PARTITION others Indicates the default partition. If the corresponding subtable storage of data cannot be found in other partitions, it will be stored in this partition.
Notes!
We hope you will design ideas based on our suggestions, but we don’t want you to blindly apply them. The specific timing scenarios are ever-changing, and specific analysis of specific situations is still very necessary.
The MARS2 table compression performance related parameters mentioned above:
| Parameter name | Default value | Min value | Maximum value | Description |
|---|---|---|---|---|
| compress_threshold | 1200 | 1 | 8000 | Compression threshold. Used to control how many tuples (tuples) are compressed in a single table, and is the upper limit of the number of Tuples compressed in the same unit. |
| compressiontype | lz4 | -- | -- | Compression algorithm, supported: 1. zstd 2. zlib 3. lz4 |
| compresslevel | 1 | 1 | -- | Compression level. Usually, the smaller the value, the smaller the compression rate, but the faster the compression; the larger the value, the higher the compression rate, but the slower the compression. Different algorithms have different valid values ranges: zstd: 1-19 zlib: 1-9 lz4: 1-20 |
Notes!
Generally speaking, the higher the compression level, the higher the compression rate and the lower the speed. But that's not absolute.
After successfully creating the MARS2 table, you must create an additional index of type mars2_btree to perform normal data reading and writing. The purpose of using index sorting is to make data of the same dimension or similar characteristics physically close as possible to reduce the number of I/O addressing and improve query efficiency. Therefore, the selection of sorting keys needs to comply with the main business query characteristics.
For example, the requirement is a single-device point query, then the sort key is the device number (vin) in the timing scenario. If the requirement is a detailed query, aggregation query or multi-device query within a certain time period, then the sort key is the device number and the timestamp (vin,daq_time).
=# CREATE INDEX idx_mars2 ON vehicle_basic_data_mars2
USING mars2_btree(vin, daq_time);
Notes!
When data from the same time point in the device is reported in batches, MARS2 can merge data from the same device (the value ofdaq_timein this table) at the same time (the value ofvinin this table). The merge feature requires thatuniquemode=truebe manually specified when creating an index, because the default value of this option isfalse. For example, when you specifyuniquemode=true, the deviceA01passes back3data at2022-01-01 00:00:00, and finally the original two data will be overwritten according to the last data, only one data will be retained; but if you defaultuniquemode=false, then the3data sent back by the device2022-01-01 00:00:00will be retained in the end without any processing.
datainspect is a storage diagnostic tool built into MARS2, which provides underlying data insights to accurately optimize the storage and query performance of your data.
Storage optimization requires paying attention to the actual content of the data in the physical file. Similar to PostgreSQL's Pageinspect, you can use datainspect to easily extract data segments in MARS2 physical storage for further analysis. In addition, datainspect integrates relevant index and meta information in MARS2 to provide basic information such as NULL distribution and minmax, thereby aiding in the optimization of the I/O scanning process.
Notes!
This tool is only available for MARS2 tables.
datainspect is a system function built in MARS2 and can be used after MARS2 is installed correctly. MARS2 tables depend on matrixts timing extension. Before building a table, you first need to create an extension in the database using the storage engine.
Notes!
matrixtsis extended to the database level, and it can be created once in a database without repeated creation.=# CREATE EXTENSION matrixts;2.2 Introduction to function functions
First, create a test table named
tb1in the example. We will use this test table as a use case to introduce three related functions.=# CREATE TABLE tb1( f1 int8 encoding(minmax), f2 int8 encoding(minmax), f3 float8 encoding(minmax), f4 text ) USING MARS2;Create a MARS2 index.
=# CREATE INDEX ON tb1 USING mars2_btree(f1);Construct 24,000 test data.
=# INSERT INTO tb1 SELECT generate_series(1, 24000), mod((random()*1000000*(generate_series(1, 1200)))::int8, (random()::int8/100 + 100)), (random() * generate_series(1, 24000))::float8, (random() * generate_series(1, 24000))::text;2.2.1 desc_ranges
The desc_ranges function integrates the underlying meta information and index of MARS2 to provide such as minmax index information and null value information, and also supports accurate monitoring of the underlying storage space occupancy of data.
grammar
SELECT <* / column1,column2,...> FROM matrixts_internal.desc_ranges(<tablename TEXT>)
Parameters
tablename: table name, the partition table name when partitioning the table. (Required)
return
| Fields | Description |
|---|---|
| segno | Segment number, starting with 0 |
| attno | Attribute Number, column number, starting with 0 |
| forkno | Number of physical file shards. It is the underlying number of the database, which can be considered to correspond to a specific file. By default, it starts from the first fork |
| offno | RANGE Offset in physical file, unit bytes. The default starts from 0. In MARS2, data is stored in batches and a storage RANGE is formed in units of set compress_threshold (compression threshold parameter), forming its own position identification with offsets in the physical file. |
| nbytes | RANGE The actual space occupied in physical files, unit bytes |
| nrows | compress_threshold set when creating tables, default is 1200. compress_threshold is the compression threshold. Used to control how many tuples (tuples) are compressed in a single table, and is the upper limit of the number of Tuples compressed in the same unit. |
| nrowsnotnull | RANGE Non-null number of bars |
| mmin | If this column supports minmax index, it displays the minimum value in the RANGE, and if it does not exist, it is NULL |
| mmax | If this column supports minmax index, it is the maximum value in the RANGE. If it does not exist, it is NULL. |
=# SELECT attno, sum(nbytes)/1024 as "Size in KB"
FROM matrixts_internal.desc_ranges('tb1') GROUP BY attno ORDER BY attno;
attno | Size in KB
-------+--------------------------------------------------------------------------------------------------------------------------
0 | 94.9062500000000000
1 | 7.8203125000000000
2 | 187.3437500000000000
3 | 386.3515625000000000
(4 rows)
=# SELECT * FROM matrixts_internal.desc_ranges('tb1') WHERE segno = 1;
segno | attno | forkname | forkno | offno | nbytes | nrows | nrowsnotnull | mmin | mmax
-------+---------+-------+-------+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | 0 | data1 | 304 | 0 | 4848 | 1200 | 1200 | 15 | 7240
1 | 1 | data1 | 304 | 16376 | 856 | 1200 | 199 | 0 | 99
1 | 2 | data1 | 304 | 17712 | 9072 | 1200 | 1200 | 1.4602231817218758 | 704.8010557110921
1 | 3 | data1 | 304 | 50024 | 20272 | 1200 | 1200 | NULL | NULL
1 | 0 | data1 | 304 | 4848 | 4856 | 1200 | 1200 | 7243 | 14103
1 | 1 | data1 | 304 | 17232 | 160 | 1200 | 0 | NULL | NULL
1 | 2 | data1 | 304 | 26784 | 9760 | 1200 | 1200 | 705.0931003474365 | 1372.9018354549075
1 | 3 | data1 | 304 | 70296 | 19680 | 1200 | 1200 | NULL | NULL
1 | 0 | data1 | 304 | 9704 | 4856 | 1200 | 1200 | 14125 | 21417
1 | 1 | data1 | 304 | 17392 | 160 | 1200 | 0 | NULL | NULL
1 | 2 | data1 | 304 | 36544 | 9760 | 1200 | 1200 | 1375.043496121433 | 2084.906658862494
1 | 3 | data1 | 304 | 89976 | 19792 | 1200 | 1200 | NULL | NULL
1 | 0 | data1 | 304 | 14560 | 1816 | 445 | 445 | 21429 | 23997
1 | 1 | data1 | 304 | 17552 | 160 | 445 | 0 | NULL | NULL
1 | 2 | data1 | 304 | 46304 | 3720 | 445 | 445 | 2086.0748374078717 | 2336.065046118657
1 | 3 | data1 | 304 | 109768 | 7576 | 445 | 445 | NULL | NULL
(16 rows)
The show_range function selects a piece of physical data in MARS2 and displays this piece of data into readable data. The currently supported data types are: int2, int4, int8, float4, float8, timestamp, date, text.
SELECT <* / column1,column2,...> FROM matrixts_internal.show_range(
tablename text,
attno int4,
forkno int4,
offno int4,
nbytes int4
)
0 in order. (Required)Notes!
For details of the above parameters, please refer to the return introduction of thedesc_rangesfunction above.
| Fields | Description |
|---|---|
| rowno | row number. This number is determined by the relative offset in RANGE, rather than the absolute offset of the entire table |
| val | Actual content |
Notes!
There may be errors in thevaldisplayed on floating point data.
=# SELECT * FROM matrixts_internal.show_range('tb1', 1, 304, 16176, 808) LIMIT 20;
rowno | val
-------+--------
1 | 4
2 | 36
3 | 81
4 | 58
5 | 17
6 | 75
7 | 11
8 | 84
9 | 60
10 | 78
11 | 69
12 | 0
13 | 87
14 | 40
15 | 72
16 | 58
17 | 17
18 | 48
19 | 70
20 | 6
(20 rows)
The dump_range function decompresses the physical data selected in MARS2 into a binary file for secondary analysis.
=# SELECT <* / column1,column2,...> FROM matrixts_internal.dump_ranges(
tablename text,
attno int4,
outfile text,
forkno int4,
offno int4,
limits int4
);
tb1-f2.bin, and after export, it appears as tb1-f2.bin-seg1 on Segment1. (Required)forkno and offno as the starting position. (Optional)Note! Here
limitslimits only the number of exported RANGEs on a single segment.
Notes!
For details of the above parameters, please refer to the return introduction of thedesc_rangesfunction above.
return
Example
=# SELECT * FROM matrixts_internal.dump_ranges('tb1', 1, '/data/demo/tb1-f2.bin', 304, 16176, 1) LIMIT 20;
nbytes
--------
0
0
1480
1632
1704
1592
(6 rows)
In the example, after executing dump_ranges, it was found that 6 results were returned. This is because the export is performed independently by the Segment. Each of them represents the export results on a downstream Segment, which will appear in the corresponding directory. Among them, nbytes of 2 results is 0 because there is no data on the Segment that meets the filtering conditions.
After execution, each Segment host will generate a binary file with the original suffix and the independent suffix identifier .-seg<no>.
