Extended Data Types

This document describes the extended data types supported by YMatrix.

In the relational model, common data types represent a single attribute per column, such as name, height, or weight. However, in time-series scenarios, the following issues often arise:

  1. The metric set is unpredictable, meaning the table schema may change frequently.
  2. The number of metrics exceeds YMatrix's maximum column limit of 1600.

In such cases, extended data types are required—these allow multiple attributes to be stored within a single column.

There are many ways to implement extended data types, such as using custom serialization formats encoded into string-type columns. YMatrix provides two extended data types:

  1. JSON / JSONB
  2. MXKV

1 JSON / JSONB

The JSON and JSONB types are inherited from PostgreSQL. Unlike storing JSON strings directly in text columns, JSON/JSONB types provide built-in functions for common operations such as extracting key-value pairs, merging, and deletion.

JSON and JSONB are used similarly and can be converted between each other. The differences are:

  1. JSON is stored as plain text; JSONB is stored in binary format.
  2. JSON has faster write performance because no binary conversion is needed during insertion.
  3. JSONB offers faster parsing due to its internal binary representation.

Below are common operations for JSON / JSONB types:

1.1 Extracting Values with ->

=# SELECT '{"a":1, "b":2, "c":3}'::json->'a';
 ?column?
----------
 1
(1 row)

For nested JSON structures, use -> multiple times:

=# SELECT '{"a":1, "b":2, "c":{"d":2}}'::json->'c'->'d';
 ?column?
----------
 2
(1 row)

1.2 Deleting Keys

Use the - operator to delete a key. When performing write operations on JSON objects, convert them to JSONB first:

=# SELECT '{"a":1, "b":2, "c":3}'::jsonb - 'a';
     ?column?
------------------
 {"b": 2, "c": 3}
(1 row)

1.3 Merging Key-Value Pairs

Use the || operator to merge two JSONB objects:

=# SELECT '{"a":1, "b":2}'::jsonb || '{"c":3}'::jsonb;
         ?column?
--------------------------
 {"a": 1, "b": 2, "c": 3}
(1 row)

1.4 Updating Values

When keys overlap, the || operator updates the value:

=# SELECT '{"a":1, "b":2}'::jsonb || '{"b":3}'::jsonb;
     ?column?
------------------
 {"a": 1, "b": 3}
(1 row)

For more JSON operations, refer to the PostgreSQL documentation.

2 MXKV

This topic is covered in the instructional video: YMatrix Data Modeling and Spatiotemporal Distribution Model

When using YMatrix for time-series data collection, you may encounter the following scenarios:

  1. The number of metrics to collect exceeds YMatrix’s 1600-column limit.
  2. Different device models report vastly different sets of metrics, resulting in many NULL values.
  3. The metric set is unknown in advance, leading to frequent schema changes.

Traditional solutions include:

  1. Serializing multiple columns into a variable-length field.
  2. Using JSON or JSONB data types.

While these approaches reduce operational complexity, they suffer from poor performance.

MXKV was developed to address these limitations. MXKV is a high-performance, scalable storage type developed by YMatrix. It is used similarly to JSON but offers advantages in efficiency: MXKV uses binary storage and integrates with YMatrix compression algorithms to achieve compact storage and fast query performance.

Note!
This feature is available only in the Enterprise Edition.

2.1 Usage

2.1.1 Create Extension and Verify

The MXKV data type is included in the MatrixTS extension. First, create the extension:

=# CREATE EXTENSION matrixts;

Verify that MXKV is enabled:

=# SELECT '{}'::mxkv_text;
 mxkv_text
-----------
 {}
(1 row)

As shown above, MXKV is now active.

2.1.2 Create Table with MXKV Column

Create a new table and define an MXKV column of type mxkv_int4 to store integer-valued key-value data:

=# CREATE TABLE data(
    time timestamp with time zone,
    tag_id int,
    kv mxkv_int4
   )
   USING MARS3
   DISTRIBUTED BY (tag_id)
   ORDER BY (time,tag_id);

2.1.3 Define Keys

Before using MXKV, define the key set to optimize compression and query performance. This step pre-defines the set of keys expected in the data.

MXKV provides the UDF mxkv_import_keys to define keys.

Two methods are available:

  1. Manual Definition

Provide a sample JSON object. MXKV automatically extracts and registers the keys.

The following SQL extracts and defines keys 'a' and 'b':

=# SELECT mxkv_import_keys('{"a": 1, "b": 2}');
 mxkv_import_keys
------------------
 a
 b
(2 rows)
  1. Define from Existing Table Data

Suppose there is a table t_json with a JSON column c2:

=# SELECT c2 FROM t_json ;
    c2
----------
 {"k1":1}
 {"k2":2}
(2 rows)

From the result, we see that c2 contains keys k1 and k2.

Use mxkv_import_keys to define keys from this table:

=# SELECT mxkv_import_keys('t_json'::regclass, 'c2');
 mxkv_import_keys
------------------
 k1
 k2
(2 rows)

As shown, pass the table name and column name as parameters.
This method is more convenient and efficient, provided that sample data already exists in another table.

2.1.4 Insert Data

After defining the keys, insert KV data:

=# INSERT INTO data VALUES(now(), 1, '{"a":1, "b":2}');
INSERT 0 1

The inserted KV data includes the previously defined keys 'a' and 'b'.

Note!
Keys must be defined before insertion. Otherwise, the following error occurs:
DETAIL: The key is not imported yet HINT: Import the keys with the mxkv_import_keys() function

2.1.5 Read Key Values

Reading key values from MXKV is similar to JSON, using the -> operator:

=# SELECT time, tag_id, kv->'a' as a, kv->'b' as b FROM data;
             time              | tag_id | a | b
-------------------------------+--------+---+---
 2021-07-02 13:23:30.405473+00 |      1 | 1 | 2
(1 row)

Extracted key values can be used like regular columns—for comparisons or joins with other tables. For example, create a table data with the same structure as data, insert identical data, and perform a join:

=# SELECT * FROM data, data1 WHERE data.kv->'a' = data1.kv->'a';
             time              | tag_id |        kv        |             time              | tag_id |        kv
-------------------------------+--------+------------------+-------------------------------+--------+------------------
 2021-07-02 13:23:30.405473+00 |      1 | {"a": 1, "b": 2} | 2021-07-02 13:26:31.828685+00 |      1 | {"a": 1, "b": 2}
(1 row)

Note!
In MXKV, -> and ->> have the same effect.

2.2 MXKV Data Types

The earlier example uses the mxkv_int4 type. Below is a detailed description of the MXKV-supported data types:

  1. mxkv_int4: Stores 32-bit integers (int4 / int).
  2. mxkv_float4: Stores 32-bit floating-point numbers (float4 / real).
  3. mxkv_float8: Stores 64-bit floating-point numbers (float8 / float / double precision).
  4. mxkv_text: Stores text strings.

For mxkv_float4 and mxkv_float8, you can specify the number of decimal places. For example:

=# CREATE TABLE data(
    time timestamp with time zone,
    tag_id int,
    kv mxkv_float4(2)
   )
   USING MARS3
   DISTRIBUTED BY (tag_id)
   ORDER BY (time,tag_id);

With this definition, digits beyond two decimal places are rounded.

Specifying scale converts the internal storage to integer format, improving compression and performance. However, the valid numeric range is reduced accordingly.

  • mxkv_float4(scale), scale >= 0
    • mxkv_float4(0): [-2147483500, 2147483500]
    • mxkv_float4(2): [-21474835.00, 21474835.00]
    • mxkv_float4(4): [-214748.3500, 214748.3500]
    • ......
  • mxkv_float8(scale), scale >= 0
    • mxkv_float8(0): [-9223372036854775000, 9223372036854775000]
    • mxkv_float4(2): [-92233720368547750.00, 92233720368547750.00]
    • mxkv_float8(4): [-922337203685477.5000, 922337203685477.5000]
    • ......

Note!
MXKV supports only flat key-value structures. Nesting and arrays are not allowed.
For more technical details, see the blog post MXKV.