This document mainly introduces the extended data types supported by YMatrix.
In relational models, common data types are used, each column represents the value of an attribute, such as: name, height, and weight. However, for timing scenarios, the following situations often occur:
At this time, you need to use an extended data type: it supports storing multiple attributes in a single column.
There are very diverse ways to implement extended data types, such as customizing serialized storage formats and encoding methods, and then storing them into string types, etc. YMatrix provides two extended data types:
The JSON / jsonb type inherits from the PostgreSQL database. Unlike directly saving JSON strings with string-type columns, the JSON / jsonb type provides commonly used JSON operation functions for reading key values, as well as performing operations such as merging and deleting.
JSON and jsonb are basically the same, and can be directly converted to each other. The difference is:
The following are the commonly used JSON/jsonb type operation methods:
SELECT '{"a":1, "b":2, "c":3}'::json->'a';
?column?
----------
1
(1 row)
If the JSON/jsonb hierarchy is nested, use -> multiple times:
SELECT '{"a":1, "b":2, "c":{"d":2}}'::json->'c'->'d';
?column?
----------
2
(1 row)
Use the - sign to delete key values. When writing to a JSON object, it must be converted to the jsonb type:
SELECT '{"a":1, "b":2, "c":3}'::jsonb - 'a';
?column?
------------------
{"b": 2, "c": 3}
(1 row)
Use the || operator to merge two jsonb keys and values together:
SELECT '{"a":1, "b":2}'::jsonb || '{"c":3}'::jsonb;
?column?
------------------------------
{"a": 1, "b": 2, "c": 3}
(1 row)
When the key values are the same, use the || operator to update the value:
SELECT '{"a":1, "b":2}'::jsonb || '{"b":3}'::jsonb;
?column?
------------------
{"a": 1, "b": 3}
(1 row)
For more JSON operation methods, please refer to PostgreSQL Document.
The teaching video of this course has been included in [YMatrix Data Modeling and Spatial-temporal Distribution Model] (https://www.bilibili.com/video/BV133411r7WH?share_source=copy_web)
When using the YMatrix timing database to collect data, you may face the following scenarios:
The traditional solutions to the above problems are as follows:
Although the above solution can solve the operation complexity, the performance is not ideal.
MXKV came into being. MXKV is an efficient and scalable storage type developed by YMatrix. Its usage method is similar to JSON. The advantage is that MXKV uses binary storage and combines the compression algorithm in YMatrix, which has small storage space and high query efficiency.
Notes!
This feature is only available in the Enterprise Edition.
MXKV custom data types are included in the MatrixTS extension, and you first need to create the extension:
CREATE EXTENSION matrixts;
Verify that MXKV is enabled:
mxadmin=# SELECT '{}'::mxkv_text;
mxkv_text
--------------
{}
(1 row)
As shown above, MXKV is turned on.
Create a new table and define the integer value data type mxkv_int4 of MXKV in the new table to store KV data with integer values:
CREATE TABLE data(
time timestamp with time zone,
tag_id int,
kv mxkv_int4
)
DISTRIBUTED BY (tag_id);
Before using the MXKV type, in order to optimize compression and query performance, you must first make key-value definitions to determine the key-value collection contained in the data in advance.
MXKV provides UDF mxkv_import_keys
to complete the key value definition.
There are two ways to define it:
Manually provide KV samples in JSON form, MXKV will automatically extract key values and complete definitions.
The following SQL will extract the key values 'a' and 'b' in JSON and define:
mxadmin=# SELECT mxkv_import_keys('{"a": 1, "b": 2}');
mxkv_import_keys
------------------
a
b
(2 rows)
Existing t_json tables, including field c2 of type JSON:
mxadmin=# SELECT c2 FROM t_json ;
c2
----------
{"k1":1}
{"k2":2}
(2 rows)
From the results, we can see that c2 includes two key values, k1 and k2.
Use mxkv_import_keys
to define through table data:
mxadmin=# SELECT mxkv_import_keys('t_json'::regclass, 'c2');
mxkv_import_keys
------------------
k1
k2
(2 rows)
As shown above, provide table names and column names in the parameters.
Compared to manual definition, this method is more convenient and faster, but the sample data has been stored in other tables as a prerequisite.
After the key value is determined, the following starts to insert KV data:
mxadmin=# INSERT INTO data VALUES(now(), 1, '{"a":1, "b":2}');
INSERT 0 1
As shown above, the inserted KV data contains the key values 'a' and 'b' that just defined.
Notes!
The key value in KV must be defined before it can be inserted correctly, otherwise the following error will occur:DETAIL: The key is not imported yet HINT: Import the keys with the mxkv_import_keys() function
The MXKV key-value content is read the same way as JSON, using the -> symbol:
mxadmin=# SELECT time, tag_id, kv->'a' as a, kv->'b' as b FROM data;
time | tag_id | a | b
---------------------------+------------------------------------------------------------------------------------------------------
2021-07-02 13:23:30.405473+00 | 1 | 1 | 2
(1 row)
The key values taken out are the same as ordinary columns, and can also be compared and joined with other tables. For example, create a table with the same structure as the data
table and insert the same data, and use the following SQL to do the connection:
mxadmin=# SELECT * FROM data, data1 WHERE data.kv->'a' = data1.kv->'a';
time | tag_id | kv | time | tag_id | kv
----------------------------------------------------------------------------------------------------------------------------------
2021-07-02 13:23:30.405473+00 | 1 | {"a": 1, "b": 2} | 2021-07-02 13:26:31.828685+00 | 1 | {"a": 1, "b": 2}
(1 row)
Notes!
In MXKV -> and ->> the effect is equivalent.
The table created above uses the mxkv_int4 type. This part introduces the several data types supported by MXKV:
Where mxkv_float4 and mxkv_float8 can specify the number of decimal places, such as:
CREATE TABLE data(
time timestamp with time zone,
tag_id int,
kv mxkv_float4(2)
)
DISTRIBUTED BY (tag_id);
After this definition, the decimal point will be rounded after two digits. After specifying the number of decimal places, the internal storage type will be converted to an integer, which is more conducive to optimization and compression. But the numerical range will also be reduced accordingly.
Notes!
MXKV only supports first-level KV structures and does not allow nesting and arrays.
For more in-depth technical information, please refer to the blog MXKV.