Extend data types

This document mainly introduces the extended data types supported by YMatrix.

In relational models, common data types are used, each column represents the value of an attribute, such as: name, height, and weight. However, for timing scenarios, the following situations often occur:

  1. The indicator set cannot be predicted, that is, the table pattern may change frequently.
  2. The metric set is too large, exceeding the YMatrix limit of up to 1600 columns.

At this time, you need to use an extended data type: it supports storing multiple attributes in a single column.

There are very diverse ways to implement extended data types, such as customizing serialized storage formats and encoding methods, and then storing them into string types, etc. YMatrix provides two extended data types:

  1. JSON/jsonb
  2. MXKV

1 JSON / jsonb

The JSON / jsonb type inherits from the PostgreSQL database. Unlike directly saving JSON strings with string-type columns, the JSON / jsonb type provides commonly used JSON operation functions for reading key values, as well as performing operations such as merging and deleting.

JSON and jsonb are basically the same, and can be directly converted to each other. The difference is:

  1. Save JSON as text, and jsonb as binary number
  2. No binary conversion is required when writing JSON, and the writing speed is faster
  3. jsonb parsing speed will be faster due to binary conversion

The following are the commonly used JSON/jsonb type operation methods:

1.1 Use -> Extract the healthy value

SELECT '{"a":1, "b":2, "c":3}'::json->'a';
 ?column?
----------
 1
(1 row)

If the JSON/jsonb hierarchy is nested, use -> multiple times:

SELECT '{"a":1, "b":2, "c":{"d":2}}'::json->'c'->'d';
 ?column?
----------
 2
(1 row)

1.2 Delete key values

Use the - sign to delete key values. When writing to a JSON object, it must be converted to the jsonb type:

SELECT '{"a":1, "b":2, "c":3}'::jsonb - 'a';
     ?column?
------------------
 {"b": 2, "c": 3}
(1 row)

1.3 Merge key values

Use the || operator to merge two jsonb keys and values ​​together:

SELECT '{"a":1, "b":2}'::jsonb || '{"c":3}'::jsonb;
         ?column?
------------------------------
 {"a": 1, "b": 2, "c": 3}
(1 row)

1.4 Update key values

When the key values ​​are the same, use the || operator to update the value:

SELECT '{"a":1, "b":2}'::jsonb || '{"b":3}'::jsonb;
     ?column?
------------------
 {"a": 1, "b": 3}
(1 row)

For more JSON operation methods, please refer to PostgreSQL Document.

2 MXKV

The teaching video of this course has been included in [YMatrix Data Modeling and Spatial-temporal Distribution Model] (https://www.bilibili.com/video/BV133411r7WH?share_source=copy_web)

When using the YMatrix timing database to collect data, you may face the following scenarios:

  1. The number of metrics to be collected exceeds the limit of up to 1600 columns on YMatrix.
  2. The collection index sets of different models of equipment vary greatly, resulting in a large number of column values ​​being empty (NULL) when returning data.
  3. The metric set cannot be predicted, and the schema of the data table (Schema) is likely to change frequently.

The traditional solutions to the above problems are as follows:

  1. Serialize multiple columns and save them in variable length field
  2. Use JSON or jsonb data types

Although the above solution can solve the operation complexity, the performance is not ideal.

MXKV came into being. MXKV is an efficient and scalable storage type developed by YMatrix. Its usage method is similar to JSON. The advantage is that MXKV uses binary storage and combines the compression algorithm in YMatrix, which has small storage space and high query efficiency.

Notes!
This feature is only available in the Enterprise Edition.

2.1 How to use

2.1.1 Create an extension and verify

MXKV custom data types are included in the MatrixTS extension, and you first need to create the extension:

CREATE EXTENSION matrixts;

Verify that MXKV is enabled:

mxadmin=# SELECT '{}'::mxkv_text;
 mxkv_text
--------------
 {}
(1 row)

As shown above, MXKV is turned on.

2.1.2 Create a data table and define a MXKV column

Create a new table and define the integer value data type mxkv_int4 of MXKV in the new table to store KV data with integer values:

CREATE TABLE data(
    time timestamp with time zone,
    tag_id int,
    kv mxkv_int4
)
DISTRIBUTED BY (tag_id);

2.1.3 Define key values

Before using the MXKV type, in order to optimize compression and query performance, you must first make key-value definitions to determine the key-value collection contained in the data in advance.

MXKV provides UDF mxkv_import_keys to complete the key value definition.

There are two ways to define it:

  1. Manually define

Manually provide KV samples in JSON form, MXKV will automatically extract key values ​​and complete definitions.

The following SQL will extract the key values ​​'a' and 'b' in JSON and define:

mxadmin=# SELECT mxkv_import_keys('{"a": 1, "b": 2}');
 mxkv_import_keys
------------------
 a
 b
(2 rows)
  1. Data definition based on existing tables

Existing t_json tables, including field c2 of type JSON:

mxadmin=# SELECT c2 FROM t_json ;
    c2
----------
 {"k1":1}
 {"k2":2}
(2 rows)

From the results, we can see that c2 includes two key values, k1 and k2.

Use mxkv_import_keys to define through table data:

mxadmin=# SELECT mxkv_import_keys('t_json'::regclass, 'c2');
 mxkv_import_keys
------------------
 k1
 k2
(2 rows)

As shown above, provide table names and column names in the parameters.
Compared to manual definition, this method is more convenient and faster, but the sample data has been stored in other tables as a prerequisite.

2.1.4 Insert data

After the key value is determined, the following starts to insert KV data:

mxadmin=# INSERT INTO data VALUES(now(), 1, '{"a":1, "b":2}');
INSERT 0 1

As shown above, the inserted KV data contains the key values ​​'a' and 'b' that just defined.

Notes!
The key value in KV must be defined before it can be inserted correctly, otherwise the following error will occur: DETAIL: The key is not imported yet HINT: Import the keys with the mxkv_import_keys() function

2.1.5 Read key-value content

The MXKV key-value content is read the same way as JSON, using the -> symbol:

mxadmin=# SELECT time, tag_id, kv->'a' as a, kv->'b' as b FROM data;
             time              | tag_id | a | b
---------------------------+------------------------------------------------------------------------------------------------------
 2021-07-02 13:23:30.405473+00 |      1 | 1 | 2
(1 row)

The key values ​​taken out are the same as ordinary columns, and can also be compared and joined with other tables. For example, create a table with the same structure as the data table and insert the same data, and use the following SQL to do the connection:

mxadmin=# SELECT * FROM data, data1 WHERE data.kv->'a' = data1.kv->'a';
             time              | tag_id |        kv        |             time              | tag_id |        kv
----------------------------------------------------------------------------------------------------------------------------------
 2021-07-02 13:23:30.405473+00 |      1 | {"a": 1, "b": 2} | 2021-07-02 13:26:31.828685+00 |      1 | {"a": 1, "b": 2}
(1 row)

Notes!
In MXKV -> and ->> the effect is equivalent.

2.2 MXKV data type

The table created above uses the mxkv_int4 type. This part introduces the several data types supported by MXKV:

  1. mxkv_int4: Stores 32-bit integer values ​​of type int4/int.
  2. mxkv_float4: Stores 32-bit floating point value of type float4 / real.
  3. mxkv_float8: Stores 64-bit floating point values ​​of type float8 / float / double precision.
  4. mxkv_text: Stores string value of type text.

Where mxkv_float4 and mxkv_float8 can specify the number of decimal places, such as:

CREATE TABLE data(
    time timestamp with time zone,
    tag_id int,
    kv mxkv_float4(2)
)
DISTRIBUTED BY (tag_id);

After this definition, the decimal point will be rounded after two digits. After specifying the number of decimal places, the internal storage type will be converted to an integer, which is more conducive to optimization and compression. But the numerical range will also be reduced accordingly.

  • mxkv_float4(scale), scale >= 0
    • mxkv_float4(0): [-2147483500, 2147483500]
    • mxkv_float4(2): [-21474835.00, 21474835.00]
    • mxkv_float4(4): [-214748.3500, 214748.3500]
    • ......
  • mxkv_float8(scale), scale >= 0
    • mxkv_float8(0): [-9223372036854775000, 9223372036854775000]
    • mxkv_float4(2): [-92233720368547750.00, 92233720368547750.00]
    • mxkv_float8(4): [-922337203685477.5000, 922337203685477.5000]
    • ......

Notes!
MXKV only supports first-level KV structures and does not allow nesting and arrays.
For more in-depth technical information, please refer to the blog MXKV.