Extensible data type: mxkv

The teaching video of this course has been included in [MatrixDB Data Modeling and Spatial-temporal Distribution Model] (https://www.bilibili.com/video/BV133411r7WH?share_source=copy_web)

When collecting data using MatrixDB timing database, you may face the following scenarios:

  1. Too many metrics to be collected exceed the maximum 1600 column limit for PostgreSQL
  2. The collection index sets of different models of equipment vary greatly, resulting in a large number of column values ​​being NULL when returning data.
  3. The indicator set cannot be predicted, that is, the table schema may change frequently

The traditional solutions to the above problems are as follows:

  1. Serialize multiple columns and exist in variable length fields.
  2. Use the json or jsonb type that comes with PostgreSQL

Although the above solution can solve the operation complexity, the performance is not ideal.

mxkv came into being, a custom kv data type developed by MatrixDB. Similar to json, the column name is used as the mxkv key value. It not only facilitates expansion, improves query performance, but also reduces storage space.

Note: This feature is only available in the Enterprise Edition.

How to use

Create an extension and verify

mxkv custom data types are included in the MatrixTS extension, first you need to create the extension:

create extension matrixts;

Verify that mxkv is enabled:

mxadmin=# select '{}'::mxkv_text;
 mxkv_text
--------------
 {}
(1 row)

As shown above, mxkv is enabled.

Create a data table and define mxkv column

Create a new table and define the integer value data type mxkv of mxkv_int4 in the new table to store the kv data with value as integer:

CREATE TABLE data(
    time timestamp with time zone,
    tag_id int,
    kv mxkv_int4
)
Distributed by (tag_id);

Define key values

Before using the mxkv type, in order to optimize compression and query performance, you must first make key-value definitions to determine the key-value collection contained in the data in advance.

mxkv provides UDF mxkv_import_keys to complete the key value definition.

There are two ways to define it:

Manual definition

Manually provide kv examples in json form, mxkv will automatically extract key values ​​and complete definition.

The following SQL will extract the key values ​​'a' and 'b' in json and define it:

mxadmin=# select mxkv_import_keys('{"a": 1, "b": 2}');
 mxkv_import_keys
------------------
 a
 b
(2 rows)

Data definition based on existing tables

Existing t_json table, including field c2 of type json:

mxadmin=# select c2 from t_json ;
    c2
----------
 {"k1":1}
 {"k2":2}
(2 rows)

From the results, we can see that c2 includes two key values: k1 and k2.

Use mxkv_import_keys to define through table data:

mxadmin=# select mxkv_import_keys('t_json'::regclass, 'c2');
 mxkv_import_keys
------------------
 k1
 k2
(2 rows)

As shown above, provide table names and column names in the parameters. Compared to manual definition, this method is more convenient and faster. The premise is that the sample data is already stored in other tables.

Insert data

After the key value is determined, the following starts to insert kv data:

mxadmin=# insert into data values(now(), 1, '{"a":1, "b":2}');
INSERT 0 1

As shown above, the inserted kv data contains the key values ​​'a' and 'b' that just defined.

Note: The key values ​​in kv must be defined before they can be inserted correctly, otherwise the following error will appear:

mxadmin=# insert into data values(now(), 1, '{"c":1}');
psql: ERROR:  unknown key "c"
LINE 1: insert into data values(now(), 1, '{"c":1}');
                                          ^
DETAIL:  The key is not imported yet
HINT:  Import the keys with the mxkv_import_keys() function

Read key-value content

The mxkv key-value content is read the same as json, using the -> symbol:

mxadmin=# select time, tag_id, kv->'a' as a, kv->'b' as b from data;
             time              | tag_id | a | b
---------------------------+------------------------------------------------------------------------------------------------------
 2021-07-02 13:23:30.405473+00 |      1 | 1 | 2
(1 row)

The key values ​​taken out are the same as ordinary columns, and can also be compared and joined with other tables. For example, create a table with the same structure as the data table and insert the same data, and use the following SQL to make connections:

mxadmin=# select * from data, data1 where data.kv->'a' = data1.kv->'a';
             time              | tag_id |        kv        |             time              | tag_id |        kv
----------------------------------------------------------------------------------------------------------------------------------
 2021-07-02 13:23:30.405473+00 |      1 | {"a": 1, "b": 2} | 2021-07-02 13:26:31.828685+00 |      1 | {"a": 1, "b": 2}
(1 row)

Note: The effect in mxkv is the same as ->>.

Data Type

The table created above uses the mxkv_int4 type. mxkv supports the following types.

  1. mxkv_int4: Store 32-bit integer values ​​of type int4/int
  2. mxkv_float4: Stores 32-bit floating point value of float4/real type
  3. mxkv_float8: Stores 64-bit floating point value of type float8/float/double precision
  4. mxkv_text: Store string value of type text

Where mxkv_float4 and mxkv_float8 can specify the number of digits after the decimal point, such as:

CREATE TABLE data(
    time timestamp with time zone,
    tag_id int,
    kv mxkv_float4(2)
)
Distributed by (tag_id);

After this definition, the decimal point will be rounded after two digits. After specifying the number of decimal places, the internal storage type will be converted to an integer, which is more conducive to optimization and compression. But the numerical range will also be reduced accordingly.

  • mxkv_float4(scale), scale >= 0
    • mxkv_float4(0): [-2147483500, 2147483500]
    • mxkv_float4(2): [-21474835.00, 21474835.00]
    • mxkv_float4(4): [-214748.3500, 214748.3500]
    • ......
  • mxkv_float8(scale), scale >= 0
    • mxkv_float8(0): [-9223372036854775000, 9223372036854775000]
    • mxkv_float4(2): [-92233720368547750.00, 92233720368547750.00]
    • mxkv_float8(4): [-922337203685477.5000, 922337203685477.5000]
    • ......

Limitation: mxkv only supports first-level kv structures and does not allow nesting and arrays.