Experience Loading Performance

Imagine driving an electric vehicle on a highway. A data collector in your car captures 80 data points per second across 80 predefined metrics, totaling 2 KB. Over one minute, you observe this continuous stream of incoming data and use it to analyze real-time conditions. But before any analysis, this valuable data must be reliably written and stored in a robust database.

Undoubtedly, time-series databases must handle massive volumes of real-time data from numerous devices—loading performance is critical!

In this section, you will use advanced components MatrixGate and MatrixBench in YMatrix to conduct high-performance data ingestion testing, ensuring reliable data loading for downstream analytics.

Below is our physical machine test environment. Hardware specifications may affect tool configuration; please adjust accordingly.

1 Hardware Environment

Machine configuration:

Parameter	Configuration
CPU Cores	2 physical cores, 32 logical cores
CPU Platform	Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
Memory	256 GB
Storage Capacity	9.0 TB (1.4 GB/s write, 3.3 GB/s read)
Linux Distribution	CentOS Linux release 7.8.2003 (Core)
Linux Kernel	3.10.0-1127.el7.x86_64

2 Professional Tools

2.1 MatrixGate

MatrixGate (shortened as mxgate) is a high-performance streaming data loading server located at bin/mxgate under the YMatrix installation directory. It fully leverages the parallel processing capabilities of distributed databases and is the preferred tool for data loading in production environments.

During testing, mxgate works with the writer component of mxbench to rapidly ingest data generated by the generator.

For more information, see mxgate.

2.2 MatrixBench

mxbench is a benchmarking tool for data loading and querying. It generates random data based on configurations such as number of devices, time range, and metric count. It can automatically create tables and perform serial or concurrent data loading and queries.

You can run mxbench via command line or define settings in a configuration file—choose the method that suits your workflow. The tool is located at bin/mxbench under the YMatrix installation directory.

3 Deployment Architecture

Single-machine deployment: Master + 6 Segments

Note!
MatrixGate and MatrixBench must be deployed on the same machine as the YMatrix cluster.

4 Start Testing

4.1 Test Cases

We provide three test scenarios with varying metric scales. At the end of this section, we compare YMatrix's write performance across different metric counts.

100,000 devices, 10 metrics
100,000 devices, 100 metrics
100,000 devices, 1,000 metrics

4.2 Begin Testing

Note!
Before using mxbench for load testing, ensure the test environment is ready: a running YMatrix cluster and properly configured environment variables. This step is required.
MatrixGate does not require manual setup—it is automatically configured and started when you launch mxbench.

For cluster deployment instructions, see Cluster Deployment.
For mxbench configuration, see mxbench.

4.2.1 100,000 Devices, 10 Metrics

mxbench parameters are divided into two parts: global and pluggable local configurations. Global settings include database and global. Pluggable components include generator, writer, and benchmark.

Since this section focuses on data loading performance, we will not cover the benchmark query tool here.
For details on benchmark, refer to mxbench.

Parameter	Default Value	Description
--database	PGDATABASE environment variable, or `postgres` if unset	Target database name
--db-master-port		Instance port number, must match environment variable setting
--db-user	Current user name	Database user
--workspace	/tmp/mxbench	Directory for CSV and query files
--watch	true	Enable process monitoring (default: enabled)
--simultaneous-loading-and-query	false	Load and query simultaneously. Default `false`: load first, then query
--table-name	(empty)	Target table name (required)
--tag-num	25000	Number of devices
--metrics-type	float8	Metric data type. Supported: `"int4"`, `"int8"`, `"float4"`, `"float8"`
--total-metrics-count	300	Total number of metrics
--ts-start		Start timestamp for generated data
--ts-end		End timestamp for generated data
--ts-step-in-second	1	Interval between metric samples (seconds)
--generator	telematics	Random data generator. Defaults to telematics scenario. Can also read from custom files or skip generation
--generator-batch-size	1	Number of rows per device per timestamp. Default `1` (no split)
--generator-disorder-ratio	0	Percentage of out-of-order data (0–100). Default `0` (no delay)
--generator-empty-value-ratio	90	Percentage of null values per row (0–100). Default `90` (90% nulls)
--generator-randomness	OFF	Data randomness level: `OFF/S/M/L`. Default `OFF` (constant values). Increases with S → M → L
--writer	http	Data writer mode. Determines how mxgate receives data

For more configurable parameters, see mxbench or run mxbench --help.

Run the following command to configure and start mxbench. Adjust parameters based on your environment. Since mxgate and mxbench are co-located, using stdin avoids network overhead by leveraging Linux pipes—lightweight and efficient.

[mxadmin@mdw ~]$ mxbench run \
  --db-database "load_test" \
  --db-database "load_test" \
  --db-master-port 5432 \
  --db-master-host "mdw" \
  --db-user "mxadmin" \
  --workspace "/tmp/mxbench/workspace" \
  --watch \
  --simultaneous-loading-and-query \
  --table-name "test_table" \
  --tag-num 100000 \
  --metrics-type "float8" \
  --total-metrics-count 10 \
  --ts-start "2022-04-19 00:00:00" \
  --ts-end "2022-04-19 00:01:00" \
  --generator "telematics" \
  --generator-batch-size 1 \
  --generator-disorder-ratio 0 \
  --generator-empty-value-ratio 0 \
  --generator-randomness "OFF" \
  --writer "stdin"

If --watch is not disabled, progress updates appear every 5 seconds. Upon completion, you’ll see a summary like:

┌───────────────────────────────────────────────────────┐
│             Summary Report for STDIN Writer            │
├─────────────────────────────────┬─────────────────────┤
│ start time:                     │ 2022-07-21 15:14:08 │
├─────────────────────────────────┼─────────────────────┤
│ stop time:                      │ 2022-07-21 15:14:27 │
├─────────────────────────────────┼─────────────────────┤
│ size written to MxGate (bytes): │ 695333400           │
├─────────────────────────────────┼─────────────────────┤
│ lines inserted:                 │ 6000000             │
├─────────────────────────────────┼─────────────────────┤
│ compress ratio:                 │ 5.399120 : 1        │
└─────────────────────────────────┴─────────────────────┘

Writer report interpretation:

Parameter	Description
start time	Data loading start time
stop time	Data loading end time
size written to MxGate (bytes)	Total bytes sent to mxgate
lines inserted	Number of rows inserted
compress ratio	Compression ratio: raw input size vs. actual table size in database

Actual runtime depends on total data volume and machine performance. As long as --watch is enabled, you’ll see real-time progress every 5 seconds, allowing you to monitor write speed and duration.

Note!
mxbench runs continuously until all data from ts-start to ts-end is loaded. You can press Ctrl+C to terminate early.

If long command lines are cumbersome, create a config file mxbench.conf, place parameters inside, and run:

[mxadmin@mdw ~]$ mxbench --config mxbench.conf

Note!
You may encounter a "hang" during data loading—progress logs continue but no real progress occurs. Stay calm. Run the following command to check logs and diagnose: cd ~/gpAdminLogs/.

4.2.2 100,000 Devices, 100 Metrics

[mxadmin@mdw ~]$ mxbench run \
  --db-database "load_test" \
  --db-master-port 5432 \
  --db-master-host "mdw" \
  --db-user "mxadmin" \
  --workspace "/tmp/mxbench/workspace" \
  --watch \
  --simultaneous-loading-and-query \
  --table-name "test_table2" \
  --tag-num 100000 \
  --metrics-type "float8" \
  --total-metrics-count 100 \
  --ts-start "2022-04-19 00:00:00" \
  --ts-end "2022-04-19 00:01:00" \
  --generator "telematics" \
  --generator-batch-size 1 \
  --generator-disorder-ratio 0 \
  --generator-empty-value-ratio 0 \
  --generator-randomness "OFF" \
  --writer "stdin"

Result:

┌───────────────────────────────────────────────────────┐
│             Summary Report for STDIN Writer            │
├─────────────────────────────────┬─────────────────────┤
│ start time:                     │ 2022-07-21 15:19:48 │
├─────────────────────────────────┼─────────────────────┤
│ stop time:                      │ 2022-07-21 15:21:02 │
├─────────────────────────────────┼─────────────────────┤
│ size written to MxGate (bytes): │ 5555333400          │
├─────────────────────────────────┼─────────────────────┤
│ lines inserted:                 │ 6000000             │
├─────────────────────────────────┼─────────────────────┤
│ compress ratio:                 │ 25.519937 : 1       │
└─────────────────────────────────┴─────────────────────┘

4.2.3 100,000 Devices, 1,000 Metrics

[mxadmin@mdw ~]$ mxbench run \
  --db-database "load_test" \
  --db-master-port 5432 \
  --db-master-host "mdw" \
  --db-user "mxadmin" \
  --workspace "/tmp/mxbench/workspace" \
  --watch \
  --simultaneous-loading-and-query \
  --table-name "test_table3" \
  --tag-num 100000 \
  --metrics-type "float8" \
  --total-metrics-count 1000 \
  --ts-start "2022-04-19 00:00:00" \
  --ts-end "2022-04-19 00:01:00" \
  --generator "telematics" \
  --generator-batch-size 1 \
  --generator-disorder-ratio 0 \
  --generator-empty-value-ratio 0 \
  --generator-randomness "OFF" \
  --writer "stdin"

Result:

┌───────────────────────────────────────────────────────┐
│             Summary Report for STDIN Writer            │
├─────────────────────────────────┬─────────────────────┤
│ start time:                     │ 2022-07-21 15:22:27 │
├─────────────────────────────────┼─────────────────────┤
│ stop time:                      │ 2022-07-21 15:33:40 │
├─────────────────────────────────┼─────────────────────┤
│ size written to MxGate (bytes): │ 54305333400         │
├─────────────────────────────────┼─────────────────────┤
│ lines inserted:                 │ 6000000             │
├─────────────────────────────────┼─────────────────────┤
│ compress ratio:                 │ 47.488209 : 1       │
└─────────────────────────────────┴─────────────────────┘

Based on the above writer reports, we provide a clear line chart comparison below. It illustrates YMatrix’s powerful data loading performance and how it scales with increasing metric counts—helping you make informed decisions about metric limits in real-world deployments.

load_performance line

Time-series data consists of timestamped data points—each representing a metric value at a specific moment. Without timestamps, it is not time-series data. Understanding data points helps interpret the chart.

The x-axis shows different metric scales (total-metrics-count). The y-axis shows write throughput—the number of data points written per second. As metric count increases, write throughput grows rapidly but eventually plateaus due to increased data volume.

Regardless, YMatrix achieves million-level write throughput. Instead of painfully inserting rows one by one with INSERT, why not accelerate with YMatrix on the data highway?

← Previous

Time Series Scenario Practice

Basic Management

English Русский 简体中文