Extend data types

This document mainly introduces the extended data types supported by YMatrix, which can be used to build [Wide table/Narrow table variant model] (/doc/5.2/datamodel/guidebook).

YMatrix is a schema relational database, and all tables need to predefined a clear schema, namely Schema, through DDL statements. Each column in the pattern represents the value of an attribute, such as: name, height, weight. However, for some business scenarios, the following situations often occur:

The indicator set cannot be predicted, that is, the table pattern may change frequently.
The metric set is too large, exceeding the YMatrix limit of up to 1600 columns.
Different indicator sets vary greatly, resulting in a large number of column values being empty (NULL) when returning data.

Serialize multiple columns and save them in variable length field

At this time, you need to use an extended data type: it supports storing multiple attributes in a single column.

There are very diverse ways to implement extended data types, such as customizing serialized storage formats and encoding methods, and then storing them into string types, etc. YMatrix provides the following extended data types:

JSON/JOSNB
MXKV/MXKV2

For example:

Data Type	Storage Method	Supported Storage Format	Supported Data Format
JSON	Text	All formats	INT/NUMBERIC/BOOLEAN/NULL, etc. Support different data formats to store in an array into a JSON column
JSONB	Binary Number	All Formats	INT/NUMBERIC/BOOLEAN/NULL, etc. Support different data formats to store in an array into a JSON column
MXKV	Binary Number	All Formats	MXKV_INT4/MXKV_FLOAT4/MXKV_FLOAT8/MXKV_TEXT Only supports first-level KV structures, nesting and arrays are not allowed. If you want to store metrics in different data formats, you need to create multiple columns
MXKV2	Binary number	MARS2/3	MXKV_INT8/MXKV_FLOAT8/MXKV_TEXT/MXKV_BOOL Only supports first-level KV structures, nesting and arrays are not allowed. If you want to store metrics in different data formats, you need to create multiple columns

1 JSON/JSONB

The JSON/JSONB type inherits from the PostgreSQL database. Unlike directly saving JSON strings with string-type columns, the JSON/JSONB type provides commonly used JSON operation functions for reading key-values, as well as performing operations such as merging and deleting.

The usage methods of JSON and JSONB are basically the same and can be directly converted to each other. The difference is:

Save JSON as text, and save JSONB as binary number
No binary conversion is required when writing JSON, and the writing speed is faster
JSONB parsing speed will be faster due to binary conversion
JSONB supports indexing

The following are the commonly used JSON/JSONB type operation methods:

1.1 Use -> Extract the healthy value

=# SELECT '{"a":1, "b":2, "c":3}'::json->'a';
 ?column?
----------
 1
(1 row)

If the JSON/JSONB hierarchy is nested, use -> multiple times:

=# SELECT '{"a":1, "b":2, "c":{"d":2}}'::json->'c'->'d';
 ?column?
----------
 2
(1 row)

1.2 Delete key values

Use the - sign to delete key values. When writing to a JSON object, it must be converted to the JSONB type:

=# SELECT '{"a":1, "b":2, "c":3}'::jsonb - 'a';
     ?column?
------------------
 {"b": 2, "c": 3}
(1 row)

1.3 Merge key values

Use the || operator to merge two JSONB type keys and values together:

=# SELECT '{"a":1, "b":2}'::jsonb || '{"c":3}'::jsonb;
         ?column?
------------------------------
 {"a": 1, "b": 2, "c": 3}
(1 row)

1.4 Update key values

When the key values are the same, use the || operator to update the value:

=# SELECT '{"a":1, "b":2}'::jsonb || '{"b":3}'::jsonb;
     ?column?
------------------
 {"a": 1, "b": 3}
(1 row)

For more JSON operation methods, please refer to PostgreSQL Document.

2 MXKV / MXKV2

MXKV and MXKV2 are efficient and scalable storage types developed by YMatrix, and their usage methods are similar to JSON/JSONB. Its advantage is that it uses binary storage and combines with the coding chain compression algorithm in YMatrix, which can achieve smaller storage space and higher query efficiency.

In addition to the items listed in the comparison table of the first 4 data types in this document, there are the following differences between the two:

In the use step, MXKV needs to manually import keys; MXKV2 does not require explicit manual import
MXKV2 has the highest compression rate and the best query performance, MXKV is second, followed by JSONB, and finally JSON
MXKV2 needs to be combined with a vectorized execution engine to obtain its unique performance benefits; MXKV cannot obtain vectorized benefits
MXKV2 must specify the compression method when creating tables; MXKV is not required

This part of the teaching video is included in YMatrix Data Modeling and Spatial-temporal Distribution Model

Notes!
This feature is only available in the Enterprise Edition.

2.1 How to use

2.1.1 Create an extension and verify

MXKV/MXKV2 Custom data types are included in the MatrixTS extension, and you must first create the extension:

=# CREATE EXTENSION matrixts;

Verify that MXKV/MXKV2 is enabled:

// MXKV
=# SELECT '{}'::mxkv_text;
 mxkv_text
--------------
 {}
(1 row)

// MXKV2
=# SELECT '{}'::mxkv2_text;
 mxkv2_text
--------------
 {}
(1 row)

As shown above, MXKV is turned on.

2.1.2 Create a data table and define a MXKV column

Create a new table and define the integer value data type in the new table to store KV data with integer values:

// MXKV
=# CREATE TABLE data_1(
    time timestamp with time zone,
    tag_id int,
    kv mxkv_int4
   )
   USING MARS3
   DISTRIBUTED BY (tag_id)
   ORDER BY (time,tag_id);

// MXKV2
=# CREATE TABLE data_2(
    time timestamp with time zone,
    tag_id int,
    kv mxkv2_int8 encoding(compresstype=mxcustom,encodechain='auto')
   )
   USING MARS3
   DISTRIBUTED BY (tag_id)
   ORDER BY (time,tag_id);

2.1.3 Define key values (MXKV2 skips this step)

Before using the MXKV type, in order to optimize compression and query performance, you must first make key-value definitions to determine the key-value collection contained in the data in advance.

MXKV provides UDF mxkv_import_keys to complete the key value definition.

There are two ways to define it:

Manual definition

Manually provide KV samples in JSON form, MXKV will automatically extract key values and complete definitions.

The following SQL will extract the key values 'a' and 'b' in JSON and define:

=# SELECT mxkv_import_keys('{"a": 1, "b": 2}');
 mxkv_import_keys
------------------
 a
 b
(2 rows)

Data definition based on existing tables

Existing t_json table, including field c2 of type JSON:

=# SELECT c2 FROM t_json ;
    c2
----------
 {"k1":1}
 {"k2":2}
(2 rows)

From the results, we can see that c2 includes two key values, k1 and k2.

Use mxkv_import_keys to define through table data:

=# SELECT mxkv_import_keys('t_json'::regclass, 'c2');
 mxkv_import_keys
------------------
 k1
 k2
(2 rows)

As shown above, provide table names and column names in the parameters.
Compared to manual definition, this method is more convenient and faster, but the sample data has been stored in other tables as a prerequisite.

Notes!
The key values in MXKV must be defined before they can be inserted correctly, otherwise the following error will occur: DETAIL: The key is not imported yet HINT: Import the keys with the mxkv_import_keys() function

2.1.4 Insert data

After the key value is determined, the following starts to insert KV data:

// MXKV
=# INSERT INTO data_1 VALUES(now(), 1, '{"a":1, "b":2}');
INSERT 0 1

// MXKV2
=# INSERT INTO data_2 VALUES(now(), 1, '{"a":1, "b":2}');
INSERT 0 1

As shown above, the inserted KV data contains the key values 'a' and 'b' that just defined.

2.1.5 Read key-value content

The reading method of MXKV key-value content is the same as that of JSON, and both use the -> symbol:

// MXKV
=# SELECT time, tag_id, kv->'a' as a, kv->'b' as b FROM data_1;
             time              | tag_id | a | b
---------------------------+------------------------------------------------------------------------------------------------------
 2023-10-19 10:20:05.334169+00 |      1 | 1 | 2
(1 row)

// MXKV2
=# SELECT time, tag_id, kv->'a' as a, kv->'b' as b FROM data_2;
             time              | tag_id | a | b
---------------------------+------------------------------------------------------------------------------------------------------
 2023-10-19 10:20:05.334169+00 |      1 | 1 | 2
(1 row)

The key values taken out are the same as ordinary columns, and can also be compared and joined with other tables. For example, create a table with the same structure as data_1/data_2 table and insert the same data. Use the following SQL to do the connection:

// MXKV
=# SELECT * FROM data_1, data_3 WHERE data_1.kv->'a' = data_3.kv->'a';
             time              | tag_id |        kv        |             time              | tag_id |        kv
----------------------------------------------------------------------------------------------------------------------------------
 2023-10-19 10:20:05.334169+00 |      1 | {"a": 1, "b": 2} | 2023-10-19 10:20:05.334169+00 |      1 | {"a": 1, "b": 2}
(1 row)

// MXKV2
=# SELECT * FROM data_2, data_3 WHERE data_2.kv->'a' = data_3.kv->'a';
             time              | tag_id |        kv        |             time              | tag_id |        kv
----------------------------------------------------------------------------------------------------------------------------------
 2023-10-19 10:20:05.334169+00 |      1 | {"a": 1, "b": 2} | 2023-10-19 10:20:05.334169+00 |      1 | {"a": 1, "b": 2}
(1 row)

Notes!
In MXKV/MXKV2, the effects of -> are the same as ->>.

2.2 Data formats supported by MXKV/MXKV2

Data Type	Supported Data Formats
MXKV	MXKV_INT4: Store 32-bit integer value of type INT4/INT MXKV_FLOAT4: Store 32-bit floating point value of type FLOAT4/REAL MXKV_FLOAT8: Store 64-bit floating point value of type FLOAT8/FLOAT/DOUBLE PRECISION MXKV_TEXT: Store string value of type TEXT
MXKV2	MXKV2_INT8: Stores 64-bit integer value of type INT8/INT MXKV2_FLOAT8: Stores 64-bit floating point value of type FLOAT8/FLOAT/DOUBLE PRECISION MXKV2_TEXT: Stores string value of type TEXT MXKV2_BOOL: Stores boolean value of type BOOLEAN

Where MXKV_FLOAT4 and MXKV_FLOAT8 can specify the number of decimal places, for example:

=# CREATE TABLE data_4(
    time timestamp with time zone,
    tag_id int,
    kv mxkv_float4(2)
   )
   USING MARS3
   DISTRIBUTED BY (tag_id)
   ORDER BY (time,tag_id);

After this definition, the decimal point will be rounded after two digits.

After specifying the number of decimal places, the internal storage type will be converted to an integer, which is more conducive to optimization and compression. But the numerical range will also be reduced accordingly.

mxkv_float4(scale), scale >= 0
- mxkv_float4(0): [-2147483500, 2147483500]
- mxkv_float4(2): [-21474835.00, 21474835.00]
- mxkv_float4(4): [-214748.3500, 214748.3500]
- ......
mxkv_float8(scale), scale >= 0
- mxkv_float8(0): [-9223372036854775000, 9223372036854775000]
- mxkv_float4(2): [-92233720368547750.00, 92233720368547750.00]
- mxkv_float8(4): [-922337203685477.5000, 922337203685477.5000]
- ......

Notes!
MXKV/MXKV2 only supports first-level KV structures, and nesting and arrays are not allowed. For more in-depth technical information, please see the blog MXKV.

简体中文