This document describes the compression algorithms available in YMatrix and how to use them.
General-purpose compression refers to compression algorithms that operate without knowledge of the internal structure of the data. These algorithms directly compress data blocks by encoding binary patterns to reduce redundancy. The compressed data cannot be randomly accessed and must be decompressed as a whole block.
YMatrix supports three general-purpose compression algorithms for data blocks: zlib, lz4, and zstd.
The lz4, zstd, and zlib compression algorithms are specified during table creation using the WITH clause. Example:
=# WITH (compresstype=zstd, compresslevel=3, compress_threshold=1200)
Note!
For more information about theWITHclause, see CREATE TABLE.
Parameter descriptions:
| Parameter | Default | Min | Max | Description |
|---|---|---|---|---|
| compress_threshold | 1200 | 1 | 8000 | Compression threshold. Controls how many tuples are compressed per block. It defines the maximum number of tuples in a compression unit. |
| compresstype | none | — | — | Compression algorithm. Supported values: 1. zstd 2. zlib 3. lz4 |
| compresslevel | 0 | 1 | — | Compression level. A lower value means faster compression but lower ratio; higher values mean slower compression but better ratio. Valid ranges vary by algorithm: zstd: 1–19 zlib: 1–9 lz4: 1–20 |
Note!
Whencompresslevel > 0andcompresstypeis not specified, the defaultcompresstypeis zlib.
Whencompresstypeis specified butcompresslevelis not, the defaultcompresslevelis 1.
Note!
Generally, higher zstd compression levels yield better ratios but slower performance. However, this is not always true.
In addition to general-purpose compression, we recommend trying YMatrix's proprietary customized compression algorithm — Encoding Chain (mxcustom).
Unlike general-purpose compression, the encoding chain leverages knowledge of the internal format and semantics of data. In relational databases, data is organized into tables where each column has a fixed data type, ensuring logical similarity among values in the same column. In many cases, adjacent rows also exhibit data similarity. By compressing and storing data column-wise, significantly better compression can be achieved.
The encoding chain provides the following capabilities:
The encoding chain offers significant advantages for time-series data, which exhibits strong characteristics such as regular time intervals, column independence, and gradual value changes over time. General-purpose algorithms like lz4 and zstd operate on byte streams and fail to exploit these patterns, resulting in suboptimal compression.
The encoding chain fully leverages time-series characteristics for deep compression, delivering three key benefits:
Note!
The easiest way to use the encoding chain is to enable Adaptive Encoding (AutoEncode) mode, which automatically detects data patterns and selects appropriate encoding methods at runtime. See below for details.
The main usage patterns of the encoding chain are listed below:
| No. | Usage |
|---|---|
| 1 | Column-level compression |
| 2 | Table-level compression (supports algorithm modification) |
| 3 | Table-level and column-level compression combined |
| 4 | Adaptive Encoding (AutoEncode) |
Before using any of these methods, create the required extension:
=# CREATE EXTENSION matrixts;
Custom compression can be specified for each column in table t1. The ENCODING clause defines the encoding chain (single or multiple algorithms, separated by commas). Example:
=# CREATE TABLE t1(
f1 int8 ENCODING(encodechain='deltadelta(7), zstd', compresstype='mxcustom'),
f2 int8 ENCODING(encodechain='lz4', compresstype='mxcustom')
)
USING MARS3
ORDER BY (f1);
Alternatively, use the following syntax:
=# CREATE TABLE t1_1(
f1 int8, COLUMN f1 ENCODING (encodechain='lz4, zstd', compresstype='mxcustom'),
f2 int8, COLUMN f2 ENCODING(encodechain='lz4', compresstype='mxcustom')
)
USING MARS3
ORDER BY (f1);
Using DEFAULT COLUMN ENCODING applies a default compression method to all columns, equivalent to table-level compression:
=# CREATE TABLE t1_2(
f1 int8,
f2 int8,
DEFAULT COLUMN ENCODING (encodechain='auto', compresstype='mxcustom')
)
USING MARS3
ORDER BY (f1);
You can apply table-level compression to table t2 using either the encoding chain or general-purpose algorithms. The key difference is that only with the encoding chain can you modify the compression algorithm after table creation via SQL.
Example: Apply zstd compression at the table level using the encoding chain:
=# CREATE TABLE t2_1 (
f1 int8,
f2 int8
)
USING MARS3
WITH(
compresstype='mxcustom',
encodechain='zstd'
)
ORDER BY (f1);
Example: Apply a zstd + lz4 compression chain at the table level:
=# CREATE TABLE t2_2 (
f1 int8,
f2 int8
)
USING MARS3
WITH(
compresstype='mxcustom',
encodechain='zstd, lz4'
)
ORDER BY (f1);
Modify the table-level compression to adaptive encoding:
=# ALTER TABLE t2_1 SET (encodechain='auto');
In Example 1, table t3_1 and column f1 are assigned lz4 and auto compression, respectively. Since column-level settings take precedence, column f1 uses lz4, while other columns (e.g., f2) use adaptive encoding.
=# CREATE TABLE t3_1 (
f1 int8 ENCODING(compresstype='lz4'),
f2 int8
)
USING MARS3
WITH(
compresstype='mxcustom',
encodechain='auto'
)
ORDER BY (f1);
In Example 2, both table t3_2 and column f1 have compression settings. Column f1 uses the specified chain lz4, deltazigzag, while f2 inherits the table-level auto setting.
=# CREATE TABLE t3_2 (
f1 int8 ENCODING(compresstype='mxcustom', encodechain='lz4, deltazigzag'),
f2 int8
)
USING MARS3
WITH(
compresstype='mxcustom',
encodechain='auto'
)
ORDER BY (f1);
YMatrix's encoding chain supports Adaptive Encoding, where the system automatically selects an optimal encoding method based on runtime data characteristics.
Enable adaptive encoding at the table level for table t4:
=# CREATE TABLE t4 (
f1 int8,
f2 int8
)
USING MARS3
WITH(
compresstype=mxcustom
)
ORDER BY (f1);
Alternatively, explicitly specify encodechain=auto. Either method is acceptable.
=# CREATE TABLE t4 (
f1 int8,
f2 int8
)
USING MARS3
WITH(
compresstype=mxcustom,
encodechain=auto
)
ORDER BY (f1);
Apply both table-level and column-level adaptive encoding on table t5. Column f1 uses the column-level setting, while f2 inherits the table-level lz4 compression.
=# CREATE TABLE t5 (
f1 int8 ENCODING (
compresstype=mxcustom,
encodechain=auto
),
f2 int8
)
USING MARS3
WITH(
compresstype=mxcustom,
encodechain=lz4
)
ORDER BY (f1);
In adaptive mode, you can set the automode parameter at the table level to prioritize either compression ratio or speed. The example below enables ratio-first mode for table t6. automode=1 prioritizes compression ratio; automode=2 prioritizes speed.
-- automode=1, auto for cost
-- automode=2, auto for speed
=# CREATE TABLE t6 (
f1 int8,
f2 int8
)
USING MARS3
WITH(
compresstype=mxcustom,
automode=1
)
ORDER BY (f1);
Note!
Adaptive encoding cannot be combined with other compression algorithms.
| Algorithm | Parameters | Description |
|---|---|---|
| lz4 & zstd | compresslevel | Integrates lz4 and zstd into the encoding chain using system compression libraries. lz4 excels in speed, especially decompression. zstd offers a better balance. At default levels, lz4 decompresses faster than zstd, while zstd achieves higher compression ratios. Generally, higher zstd levels yield better ratios but slower performance—though exceptions exist. |
| deltadelta | Scaling factor (optional). E.g., deltadelta(7) scales differences by 7 bits before storage. Default: no scaling. |
Applies second-order differencing, ideal for sorted timestamps without gaps. A perfect sequence becomes all zeros, enabling high compression. Works only on integers and is effective when second differences are small. |
| deltazigzag | Scaling factor (optional) | Performs first-order differencing, then uses zigzag encoding to convert negatives to positives, followed by variable-length integer encoding. Suitable for small-range integer columns without ordering requirements. |
| Gorilla | None | Designed for floating-point compression. Uses XOR between consecutive values to eliminate leading/trailing zeros. Currently supports only double (8-byte) values. |
| Gorilla2 | None | An improved version of Gorilla that captures broader data patterns. Offers significantly better compression than Gorilla in most time-series scenarios. Matches zstd in compression ratio and time, but outperforms zstd in decompression speed. Supports float4 and float8. |
| Floatint | Scaling factor (required) | Useful when Gorilla performs poorly on slowly changing floats (e.g., GPS coordinates). Converts floats to scaled integers before compression. Note: Introduces precision loss. Error depends on scaling factor; e.g., factor 4 implies a maximum error of 0.0001. |
| simple8b | None | Ideal for small-range integers. Packs multiple small integers into 8 bytes. For example, values < 8 can be stored using 3 bits each, achieving good compression. lz4 may perform poorly on such irregular data. |
| fds | None | Designed for cases where floating-point columns store integer values (common in time-series). Detects integer patterns, converts to binary integer format, then compresses. On the TSBS cpu-only dataset (13 columns, 10 float8 random integers), fds achieves 2x better compression than zstd (30% of zstd's size). |