Extended Data Types

This document describes the extended data types supported by YMatrix, which can be used to build wide/narrow table variant models.

YMatrix is a schema-based relational database. All tables must have a predefined schema defined via DDL statements. Each column in the schema represents an attribute value, such as name, height, or weight. However, in certain business scenarios, the following issues often arise:

  1. The metric set is unpredictable, meaning the table schema may need frequent changes.
  2. The number of metrics exceeds YMatrix's maximum column limit of 1600.
  3. Metric sets vary significantly across rows, resulting in many NULL values when data is retrieved.

To address these issues, multiple attributes can be serialized and stored in variable-length fields.

In such cases, extended data types are required—these allow multiple attributes to be stored within a single column.

There are many ways to implement extended data types. For example, custom serialization formats and encoding methods can be defined and stored in string-type columns. YMatrix provides the following extended data types:

Comparison:

Data Type Storage Format Supported Storage Formats Supported Data Formats
JSON Text All formats INT/NUMERIC/BOOLEAN/NULL, etc.
Supports storing different data types in arrays within a single JSON column
JSONB Binary All formats INT/NUMERIC/BOOLEAN/NULL, etc.
Supports storing different data types in arrays within a single JSON column
MXKV Binary All formats MXKV_INT4/MXKV_FLOAT4/MXKV_FLOAT8/MXKV_TEXT
Only supports flat key-value structure; no nesting or arrays allowed. Multiple columns are required to store metrics of different data types
MXKV2 Binary MARS2/3 MXKV_INT8/MXKV_FLOAT8/MXKV_TEXT/MXKV_BOOL
Only supports flat key-value structure; no nesting or arrays allowed. Multiple columns are required to store metrics of different data types


1 JSON/JSONB

The JSON and JSONB data types are inherited from PostgreSQL. Unlike storing JSON strings directly in text columns, JSON/JSONB types provide built-in functions for common JSON operations such as extracting key-values, merging, and deletion.

JSON and JSONB are used similarly and can be converted between each other. The differences are:

  1. JSON is stored as text; JSONB is stored in binary format.
  2. JSON has faster write performance because no binary conversion is needed during insertion.
  3. JSONB offers faster parsing due to binary storage.
  4. JSONB supports indexing.

Below are common operations for JSON/JSONB types:

1.1 Extract Key-Value Using ->

=# SELECT '{"a":1, "b":2, "c":3}'::json->'a';
 ?column?
----------
 1
(1 row)

For nested JSON/JSONB structures, use -> multiple times:

=# SELECT '{"a":1, "b":2, "c":{"d":2}}'::json->'c'->'d';
 ?column?
----------
 2
(1 row)

1.2 Delete Key-Value

Use the - operator to delete a key. When performing write operations on JSON objects, the data must be cast to JSONB:

=# SELECT '{"a":1, "b":2, "c":3}'::jsonb - 'a';
     ?column?
------------------
 {"b": 2, "c": 3}
(1 row)

1.3 Merge Key-Values

Use the || operator to merge two JSONB objects:

=# SELECT '{"a":1, "b":2}'::jsonb || '{"c":3}'::jsonb;
         ?column?
--------------------------
 {"a": 1, "b": 2, "c": 3}
(1 row)

1.4 Update Key-Value

When keys are the same, using || updates the value:

=# SELECT '{"a":1, "b":2}'::jsonb || '{"b":3}'::jsonb;
     ?column?
------------------
 {"a": 1, "b": 3}
(1 row)

For more JSON operations, refer to the PostgreSQL documentation.


2 MXKV / MXKV2

MXKV and MXKV2 are YMatrix-developed high-performance, scalable storage types with usage similar to JSON/JSONB. Their advantage lies in binary storage combined with YMatrix's encoding chain compression algorithms, enabling smaller storage footprint and higher query efficiency.

In addition to the differences listed in the comparison table above, MXKV and MXKV2 differ in the following ways:

  • Key Import: MXKV requires manual key import; MXKV2 does not.
  • Performance: MXKV2 offers the highest compression ratio and best query performance, followed by MXKV, then JSONB, and finally JSON.
  • Vectorized Execution: MXKV2 achieves its performance benefits only when used with the vectorized execution engine; MXKV does not benefit from vectorization.
  • Compression Specification: When creating tables, MXKV2 requires specifying a compression method; MXKV does not.

Tutorial videos for this section are available in YMatrix Data Modeling and Spatiotemporal Distribution Models

Note!
This feature is available only in the Enterprise Edition.

2.1 Usage

2.1.1 Create Extension and Verify

The MXKV and MXKV2 custom data types are included in the MatrixTS extension. First, create the extension:

=# CREATE EXTENSION matrixts;

Verify that MXKV/MXKV2 is enabled:

// MXKV
=# SELECT '{}'::mxkv_text;
 mxkv_text
-----------
 {}
(1 row)

// MXKV2
=# SELECT '{}'::mxkv2_text;
 mxkv2_text
-----------
 {}
(1 row)

As shown, MXKV is enabled.

2.1.2 Create Table with MXKV Column

Create a new table and define a column of integer type to store integer-valued KV data:

// MXKV
=# CREATE TABLE data_1(
    time timestamp with time zone,
    tag_id int,
    kv mxkv_int4
   )
   USING MARS3
   DISTRIBUTED BY (tag_id)
   ORDER BY (time,tag_id);

// MXKV2
=# CREATE TABLE data_2(
    time timestamp with time zone,
    tag_id int,
    kv mxkv2_int8 encoding(compresstype=mxcustom,encodechain='auto')
   )
   USING MARS3
   DISTRIBUTED BY (tag_id)
   ORDER BY (time,tag_id);

2.1.3 Define Keys (Skip for MXKV2)

Before using MXKV, keys must be defined to optimize compression and query performance. This step pre-defines the set of keys expected in the data.

MXKV provides the UDF mxkv_import_keys to define keys.

Two methods are available:

  • Manual Definition

Provide a JSON-formatted KV sample. MXKV automatically extracts and defines the keys.

The following SQL extracts and defines keys 'a' and 'b' from the JSON:

=# SELECT mxkv_import_keys('{"a": 1, "b": 2}');
 mxkv_import_keys
------------------
 a
 b
(2 rows)
  • Define from Existing Table Data

Assume a table t_json exists with a JSON column c2:

=# SELECT c2 FROM t_json ;
    c2
----------
 {"k1":1}
 {"k2":2}
(2 rows)

The result shows two keys: k1 and k2.

Use mxkv_import_keys to define keys from table data:

=# SELECT mxkv_import_keys('t_json'::regclass, 'c2');
 mxkv_import_keys
------------------
 k1
 k2
(2 rows)

As shown, pass the table name and column name as parameters.
This method is more convenient and faster, but requires sample data to already exist in another table.

Note!
Keys in MXKV must be defined before data insertion. Otherwise, the following error occurs:
DETAIL: The key is not imported yet HINT: Import the keys with the mxkv_import_keys() function

2.1.4 Insert Data

After defining keys, insert KV data:

// MXKV
=# INSERT INTO data_1 VALUES(now(), 1, '{"a":1, "b":2}');
INSERT 0 1

// MXKV2
=# INSERT INTO data_2 VALUES(now(), 1, '{"a":1, "b":2}');
INSERT 0 1

As shown, the inserted KV data contains the previously defined keys 'a' and 'b'.

2.1.5 Read Key-Value Content

Reading key-values in MXKV uses the same -> operator as JSON:

// MXKV
=# SELECT time, tag_id, kv->'a' as a, kv->'b' as b FROM data_1;
             time              | tag_id | a | b
-------------------------------+--------+---+---
 2023-10-19 10:20:05.334169+00 |      1 | 1 | 2
(1 row)

// MXKV2
=# SELECT time, tag_id, kv->'a' as a, kv->'b' as b FROM data_2;
             time              | tag_id | a | b
-------------------------------+--------+---+---
 2023-10-19 10:20:05.334169+00 |      1 | 1 | 2
(1 row)

Extracted key-values can be used like regular columns—for comparisons or joins with other tables. For example, create a table data_1/data_2 with the same structure as data_1 and insert identical data. The following SQL performs a join:

// MXKV
=# SELECT * FROM data_1, data_3 WHERE data_1.kv->'a' = data_3.kv->'a';
             time              | tag_id |        kv        |             time              | tag_id |        kv
-------------------------------+--------+------------------+-------------------------------+--------+------------------
 2023-10-19 10:20:05.334169+00 |      1 | {"a": 1, "b": 2} | 2023-10-19 10:20:05.334169+00 |      1 | {"a": 1, "b": 2}
(1 row)

// MXKV2
=# SELECT * FROM data_2, data_3 WHERE data_2.kv->'a' = data_3.kv->'a';
             time              | tag_id |        kv        |             time              | tag_id |        kv
-------------------------------+--------+------------------+-------------------------------+--------+------------------
 2023-10-19 10:20:05.334169+00 |      1 | {"a": 1, "b": 2} | 2023-10-19 10:20:05.334169+00 |      1 | {"a": 1, "b": 2}
(1 row)

Note!
In MXKV/MXKV2, -> and ->> are equivalent.

2.2 Supported Data Formats for MXKV/MXKV2

Data Type Supported Data Formats
MXKV MXKV_INT4: Stores 32-bit integers (INT4/INT)
MXKV_FLOAT4: Stores 32-bit floating-point numbers (FLOAT4/REAL)
MXKV_FLOAT8: Stores 64-bit floating-point numbers (FLOAT8/FLOAT/DOUBLE PRECISION)
MXKV_TEXT: Stores TEXT strings
MXKV2 MXKV2_INT8: Stores 64-bit integers (INT8/INT)
MXKV2_FLOAT8: Stores 64-bit floating-point numbers (FLOAT8/FLOAT/DOUBLE PRECISION)
MXKV2_TEXT: Stores TEXT strings
MXKV2_BOOL: Stores BOOLEAN values

MXKV_FLOAT4 and MXKV_FLOAT8 support specifying decimal scale. For example:

=# CREATE TABLE data_4(
    time timestamp with time zone,
    tag_id int,
    kv mxkv_float4(2)
   )
   USING MARS3
   DISTRIBUTED BY (tag_id)
   ORDER BY (time,tag_id);

With this definition, digits beyond two decimal places are rounded.

Specifying scale converts the internal storage to integer format, improving compression and efficiency. However, the value range is reduced accordingly.

  • mxkv_float4(scale), scale >= 0
    • mxkv_float4(0): [-2147483500, 2147483500]
    • mxkv_float4(2): [-21474835.00, 21474835.00]
    • mxkv_float4(4): [-214748.3500, 214748.3500]
    • ......
  • mxkv_float8(scale), scale >= 0
    • mxkv_float8(0): [-9223372036854775000, 9223372036854775000]
    • mxkv_float8(2): [-92233720368547750.00, 92233720368547750.00]
    • mxkv_float8(4): [-922337203685477.5000, 922337203685477.5000]
    • ......

Note!
MXKV and MXKV2 support only flat key-value structures. Nesting and arrays are not allowed.
For more technical details, see the blog MXKV.