Experience Loading Performance

Imagine driving an electric vehicle on a highway. A data collector in your car captures 80 data points per second across 80 predefined metrics, totaling 2 KB. Over one minute, you observe this continuous stream of incoming data and use it to analyze real-time conditions. But before any analysis, this valuable data must be reliably written and stored in a robust database.

Undoubtedly, time-series databases must handle massive volumes of real-time data from numerous devices—loading performance is critical!

In this section, you will use advanced components MatrixGate and MatrixBench in YMatrix to conduct high-performance data ingestion testing, ensuring reliable data loading for downstream analytics.

Below is our physical machine test environment. Hardware specifications may affect tool configuration; please adjust accordingly.

1 Hardware Environment

Machine configuration:

Parameter Configuration
CPU Cores 2 physical cores, 32 logical cores
CPU Platform Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
Memory 256 GB
Storage Capacity 9.0 TB (1.4 GB/s write, 3.3 GB/s read)
Linux Distribution CentOS Linux release 7.8.2003 (Core)
Linux Kernel 3.10.0-1127.el7.x86_64

2 Professional Tools

2.1 MatrixGate

MatrixGate (shortened as mxgate) is a high-performance streaming data loading server located at bin/mxgate under the YMatrix installation directory. It fully leverages the parallel processing capabilities of distributed databases and is the preferred tool for data loading in production environments.

During testing, mxgate works with the writer component of mxbench to rapidly ingest data generated by the generator.

For more information, see mxgate.

2.2 MatrixBench

mxbench is a benchmarking tool for data loading and querying. It generates random data based on configurations such as number of devices, time range, and metric count. It can automatically create tables and perform serial or concurrent data loading and queries.

You can run mxbench via command line or define settings in a configuration file—choose the method that suits your workflow. The tool is located at bin/mxbench under the YMatrix installation directory.

3 Deployment Architecture

Single-machine deployment: Master + 6 Segments

Note!
MatrixGate and MatrixBench must be deployed on the same machine as the YMatrix cluster.

4 Start Testing

4.1 Test Cases

We provide three test scenarios with varying metric scales. At the end of this section, we compare YMatrix's write performance across different metric counts.

  1. 100,000 devices, 10 metrics
  2. 100,000 devices, 100 metrics
  3. 100,000 devices, 1,000 metrics

4.2 Begin Testing

Note!
Before using mxbench for load testing, ensure the test environment is ready: a running YMatrix cluster and properly configured environment variables. This step is required.
MatrixGate does not require manual setup—it is automatically configured and started when you launch mxbench.

For cluster deployment instructions, see Cluster Deployment.
For mxbench configuration, see mxbench.

4.2.1 100,000 Devices, 10 Metrics

mxbench parameters are divided into two parts: global and pluggable local configurations. Global settings include database and global. Pluggable components include generator, writer, and benchmark.

Since this section focuses on data loading performance, we will not cover the benchmark query tool here.
For details on benchmark, refer to mxbench.

Parameter Default Value Description
--database PGDATABASE environment variable, or postgres if unset Target database name
--db-master-port Instance port number, must match environment variable setting
--db-user Current user name Database user
--workspace /tmp/mxbench Directory for CSV and query files
--watch true Enable process monitoring (default: enabled)
--simultaneous-loading-and-query false Load and query simultaneously. Default false: load first, then query
--table-name (empty) Target table name (required)
--tag-num 25000 Number of devices
--metrics-type float8 Metric data type. Supported: "int4", "int8", "float4", "float8"
--total-metrics-count 300 Total number of metrics
--ts-start Start timestamp for generated data
--ts-end End timestamp for generated data
--ts-step-in-second 1 Interval between metric samples (seconds)
--generator telematics Random data generator. Defaults to telematics scenario. Can also read from custom files or skip generation
--generator-batch-size 1 Number of rows per device per timestamp. Default 1 (no split)
--generator-disorder-ratio 0 Percentage of out-of-order data (0–100). Default 0 (no delay)
--generator-empty-value-ratio 90 Percentage of null values per row (0–100). Default 90 (90% nulls)
--generator-randomness OFF Data randomness level: OFF/S/M/L. Default OFF (constant values). Increases with S → M → L
--writer http Data writer mode. Determines how mxgate receives data

For more configurable parameters, see mxbench or run mxbench --help.

Run the following command to configure and start mxbench. Adjust parameters based on your environment. Since mxgate and mxbench are co-located, using stdin avoids network overhead by leveraging Linux pipes—lightweight and efficient.

[mxadmin@mdw ~]$ mxbench run \
  --db-database "load_test" \
  --db-database "load_test" \
  --db-master-port 5432 \
  --db-master-host "mdw" \
  --db-user "mxadmin" \
  --workspace "/tmp/mxbench/workspace" \
  --watch \
  --simultaneous-loading-and-query \
  --table-name "test_table" \
  --tag-num 100000 \
  --metrics-type "float8" \
  --total-metrics-count 10 \
  --ts-start "2022-04-19 00:00:00" \
  --ts-end "2022-04-19 00:01:00" \
  --generator "telematics" \
  --generator-batch-size 1 \
  --generator-disorder-ratio 0 \
  --generator-empty-value-ratio 0 \
  --generator-randomness "OFF" \
  --writer "stdin" 

If --watch is not disabled, progress updates appear every 5 seconds. Upon completion, you’ll see a summary like:

┌───────────────────────────────────────────────────────┐
│             Summary Report for STDIN Writer            │
├─────────────────────────────────┬─────────────────────┤
│ start time:                     │ 2022-07-21 15:14:08 │
├─────────────────────────────────┼─────────────────────┤
│ stop time:                      │ 2022-07-21 15:14:27 │
├─────────────────────────────────┼─────────────────────┤
│ size written to MxGate (bytes): │ 695333400           │
├─────────────────────────────────┼─────────────────────┤
│ lines inserted:                 │ 6000000             │
├─────────────────────────────────┼─────────────────────┤
│ compress ratio:                 │ 5.399120 : 1        │
└─────────────────────────────────┴─────────────────────┘  

Writer report interpretation:

Parameter Description
start time Data loading start time
stop time Data loading end time
size written to MxGate (bytes) Total bytes sent to mxgate
lines inserted Number of rows inserted
compress ratio Compression ratio: raw input size vs. actual table size in database

Actual runtime depends on total data volume and machine performance. As long as --watch is enabled, you’ll see real-time progress every 5 seconds, allowing you to monitor write speed and duration.

Note!
mxbench runs continuously until all data from ts-start to ts-end is loaded. You can press Ctrl+C to terminate early.

If long command lines are cumbersome, create a config file mxbench.conf, place parameters inside, and run:

[mxadmin@mdw ~]$ mxbench --config mxbench.conf

Note!
You may encounter a "hang" during data loading—progress logs continue but no real progress occurs. Stay calm. Run the following command to check logs and diagnose: cd ~/gpAdminLogs/.

4.2.2 100,000 Devices, 100 Metrics

[mxadmin@mdw ~]$ mxbench run \
  --db-database "load_test" \
  --db-master-port 5432 \
  --db-master-host "mdw" \
  --db-user "mxadmin" \
  --workspace "/tmp/mxbench/workspace" \
  --watch \
  --simultaneous-loading-and-query \
  --table-name "test_table2" \
  --tag-num 100000 \
  --metrics-type "float8" \
  --total-metrics-count 100 \
  --ts-start "2022-04-19 00:00:00" \
  --ts-end "2022-04-19 00:01:00" \
  --generator "telematics" \
  --generator-batch-size 1 \
  --generator-disorder-ratio 0 \
  --generator-empty-value-ratio 0 \
  --generator-randomness "OFF" \
  --writer "stdin" 

Result:

┌───────────────────────────────────────────────────────┐
│             Summary Report for STDIN Writer            │
├─────────────────────────────────┬─────────────────────┤
│ start time:                     │ 2022-07-21 15:19:48 │
├─────────────────────────────────┼─────────────────────┤
│ stop time:                      │ 2022-07-21 15:21:02 │
├─────────────────────────────────┼─────────────────────┤
│ size written to MxGate (bytes): │ 5555333400          │
├─────────────────────────────────┼─────────────────────┤
│ lines inserted:                 │ 6000000             │
├─────────────────────────────────┼─────────────────────┤
│ compress ratio:                 │ 25.519937 : 1       │
└─────────────────────────────────┴─────────────────────┘

4.2.3 100,000 Devices, 1,000 Metrics

[mxadmin@mdw ~]$ mxbench run \
  --db-database "load_test" \
  --db-master-port 5432 \
  --db-master-host "mdw" \
  --db-user "mxadmin" \
  --workspace "/tmp/mxbench/workspace" \
  --watch \
  --simultaneous-loading-and-query \
  --table-name "test_table3" \
  --tag-num 100000 \
  --metrics-type "float8" \
  --total-metrics-count 1000 \
  --ts-start "2022-04-19 00:00:00" \
  --ts-end "2022-04-19 00:01:00" \
  --generator "telematics" \
  --generator-batch-size 1 \
  --generator-disorder-ratio 0 \
  --generator-empty-value-ratio 0 \
  --generator-randomness "OFF" \
  --writer "stdin" 

Result:

┌───────────────────────────────────────────────────────┐
│             Summary Report for STDIN Writer            │
├─────────────────────────────────┬─────────────────────┤
│ start time:                     │ 2022-07-21 15:22:27 │
├─────────────────────────────────┼─────────────────────┤
│ stop time:                      │ 2022-07-21 15:33:40 │
├─────────────────────────────────┼─────────────────────┤
│ size written to MxGate (bytes): │ 54305333400         │
├─────────────────────────────────┼─────────────────────┤
│ lines inserted:                 │ 6000000             │
├─────────────────────────────────┼─────────────────────┤
│ compress ratio:                 │ 47.488209 : 1       │
└─────────────────────────────────┴─────────────────────┘

Based on the above writer reports, we provide a clear line chart comparison below. It illustrates YMatrix’s powerful data loading performance and how it scales with increasing metric counts—helping you make informed decisions about metric limits in real-world deployments.

load_performance line

Time-series data consists of timestamped data points—each representing a metric value at a specific moment. Without timestamps, it is not time-series data. Understanding data points helps interpret the chart.

The x-axis shows different metric scales (total-metrics-count). The y-axis shows write throughput—the number of data points written per second. As metric count increases, write throughput grows rapidly but eventually plateaus due to increased data volume.

Regardless, YMatrix achieves million-level write throughput. Instead of painfully inserting rows one by one with INSERT, why not accelerate with YMatrix on the data highway?