The MARS2 storage engine is mainly designed for high-speed data loading and querying, and it uses ordered storage to save search.
The MARS2 table is the same as MARS3 and also uses the **LSM Tree structure and is stored.
The internal schematic diagram of MARS2 is as follows:
_1688719834.png)
We will interpret the above picture in the form of conceptual analysis layer by layer.
COLLATE C
in this column can speed up sorting.mars2_automerge_threshold
), triggers L0 merge and merges all runs of L0 into one Run.level0_upgrade_size
), upgrade it to L1.level1_upgrade_size
) after upgrading to L1, a merge is triggered and all runs of L1 are merged into one Run.level1_upgrade_size
), upgrade it to L2.First create the matrixts
extension:
=# CREATE EXTENSION matrixts ;
An explicit declaration of record minimum maximum value.
=# CREATE TABLE metrics (
ts timestamp ENCODING(minmax),
dev_id bigint ENCODING(minmax),
power float,
speed float,
message text
) USING MARS2;
Creates the mars2_btree
index.
=# CREATE INDEX ON metrics
USING mars2_btree (ts, dev_id);
Same as MARS3.
INSERT
, and then goes through these processes to generate a Run.TRUNCATE
partition table.On the premise that the matrixts extension has been created, the simplest way to create tables is to add the USING
clause to the CREATE TABLE statement and create an index. For extended examples, see [Table Design Best Practices] (/doc/5.0/reference/storage/table_design).
=# CREATE TABLE metrics (
ts timestamp,
dev_id bigint,
power float,
speed float,
message text
) USING MARS2;
=# CREATE INDEX ON metrics
USING mars2_btree (ts, dev_id);
Notes!
The following contents table-level configuration item refers to configuration items that can only be configured using theWITH
clause when creating data tables. They are suitable for single tables and cannot be modified once configured. Global configuration item refers to configuration items that can be configured at the session or system level, and system-level modifications require execution ofmxstop -u
to take effect. For more information, see [Data Table Configuration Parameters] (doc/latest/reference/configuration_parameters/database_table_parameters#mars2).
The following parameters are used to control merges (Merge), and the way they affect merges is shown in the Level section above.
Merge Control Parameters | Units | Default Value | Value Range | Description |
---|---|---|---|---|
mars2_automerge_threshold | run | 32 | 10 - 2048 | Used to control all *** MARS2 tables, how many Runs can trigger merges in L0, which is a global configuration item. If you want to specify individually for a form you can use the table option level0_merge_threshold |
level0_merge_threshold | run | 32 | 1 - 2048 | Used to control how many Runs to trigger merges to configure items for table-level configuration |
level0_upgrade_size | MB | 25 | 1 - 10000 | Control Single table L0 -> L1 upgraded size. When L0 merges the result Run exceeds this size, it will be upgraded to L1, and the table-level configuration item is configured. |
level1_upgrade_size | MB | 1000 | 1 - 10000 | ControlSingle tableL1 ->L2 upgrade size. When L1 merges the result Run exceeds this size, it will be upgraded to L2, and the table-level configuration item is configured. |
The following parameters are used for compression control and adjust the compression effect. If the compression effect is not obvious if the configuration is too low, the configuration is too high and consumes more memory.
Compression Control Parameters | Units | Default Value | Value Range | Description |
---|---|---|---|---|
compress_threshold | Tuple | 1200 | 1 - 100000 | Compression threshold. Used to control how many tuples (tuples) are compressed in a single table. It is the upper limit of the number of tuples compressed in the same unit and is a table-level configuration item. |
The following parameters are used for memory control. This part of the parameters controls the size of the sorting memory used when inserting data. When multiple partition tables are inserted, each partition table will allocate the memory configured in mars2_sort_mem_core
; if it is not enough, it will be expanded, but the total amount will not exceed mars2_sort_mem
.
Sorting memory parameters | Units | Default value | Value range | Description |
---|---|---|---|---|
mars2_sort_mem | KB | 2097152KB (2GB) | 128KB - 2147483647KB (~ 2048GB) | Controls the sorted memory size of each ** individual insert. If the insert target table is a partition table, they will share this size and configure the global item |
mars2_sort_mem_core | KB | 16384KB (16MB) | 128KB - 2147483647KB (~ 2048GB) | Control at least how much sorted memory is allocated for each single partition table, and configure the global configuration item |
Example table configuration items:
=# CREATE TABLE metrics (
ts timestamp,
dev_id bigint,
power float,
speed float,
message text
)
USING MARS2
WITH (compress_threshold=1200,level0_merge_threshold=32);
=# CREATE INDEX ON metrics
USING mars2_btree (ts, dev_id);
Example of global configuration items modifying values at the session level:
=# SET mars2_sort_mem TO 2097152;
Example of global configuration items modifying values at the system level:
=# gpconfig -c mars2_sort_mem -v 2097152
=# \q
$ mxstop -u
HEAP is YMatrix's default storage engine, also known as heap storage, inherited from PostgreSQL, only supports row storage, not column storage and compression, and supports partition tables. It is implemented based on the MVCC mechanism and is suitable for scenarios with a large number of updates and deletion requirements.
The MVCC (Multiversion Concurrency Control) mechanism is usually called multiversion management. Its core is the update, modification and deletion of data.
In multi-version management, data updates and deletion do not necessarily modify them on the original data, but rather create a new version, mark the original data as invalid data, and then add new data to the new version, and the data has multiple versions. Each data has a version information and the historical version will be saved.
Under the influence of the MVCC mechanism, when the HEAP table handles update and delete operations, it does not really delete the data, but only relies on the data version information to block the old data (controls the visibility of the data). Therefore, if the HEAP table is updated or deleted in large quantities, the physical space it consumes will continue to increase, and you need to clean up old data regularly in a planned and regular manner.
You can use the following SQL statement to create a HEAP table in YMatrix.
=# CREATE TABLE disk_heap(
time timestamp with time zone,
tag_id int,
read float,
write float
)
DISTRIBUTED BY (tag_id);
Tables with AOCO and AORO as storage engines are collectively called AO (Append-Optimized) tables, also known as append optimization tables, which can support insertion, update and delete operations and support compression.
AORO supports row storage, and AOCO supports column storage.
AO tables are very different from HEAP tables in terms of the logical structure and physical structure of the table. As mentioned in the previous section, the HEAP table uses the MVCC mechanism to control the visibility of data after update and deletion operations, while the AO table uses an additional bitmap table to implement it. The content of this table is to indicate which data in the AO table is visible.
For AO tables with a large number of updates and deletion operations, old data also needs to be cleaned regularly in a planned manner. However, in AO tables, the data cleaning tool vacuumcum needs to reset the bitmap and compress physical files, so it is usually slower than HEAP.
Notes!
For details of MARS2, HEAP, AO storage engines, please refer to [Table Design Best Practices] (/doc/5.0/reference/storage/table_design)