Experience Loading Performance

Imagine driving an electric vehicle on a highway. A data collector under your foot captures 80 data points per second across 80 predefined metrics, totaling 2 KB. Within one minute, you can observe this continuously streaming data and use it for real-time analysis of any condition. However, before that, you must rely on a reliable database to ingest and store this valuable data.

Undoubtedly, time-series databases need to handle massive real-time data from numerous devices—loading performance is critical!

In this section, you will use advanced components in YMatrix—MatrixGate and MatrixBench—to conduct powerful data ingestion performance testing, ensuring reliability for subsequent analytics.

Our physical machine test environment is listed below. Hardware specifications may affect tool configuration parameters, so be sure to choose appropriate settings for your own system.

1 Hardware Environment

The machine configuration is as follows:

Parameter Configuration
CPU Cores 2 physical cores, 32 logical cores
CPU Platform Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
Memory 256 GB
Storage Capacity 9.0 TB (1.4 GB/s write, 3.3 GB/s read)
Linux Distribution CentOS Linux release 7.8.2003 (Core)
Linux Kernel 3.10.0-1127.el7.x86_64

2 Professional Tools

2.1 MatrixGate

MatrixGate (shortened as mxgate) is a high-performance streaming data loading server located at bin/mxgate under the YMatrix installation directory. It fully leverages the parallel processing capabilities of the distributed database and is the preferred tool for data loading in production environments. During testing, mxgate works with the writer component of mxbench to rapidly ingest random data generated by the generator.

For more information, see mxgate.

2.2 MatrixBench

mxbench is a stress-testing tool for data loading and querying. It generates random data based on configurations such as number of devices, time range, and number of metrics. It automatically creates tables and supports serial or concurrent data loading and queries. You can configure and run mxbench via command line or define settings in a configuration file—choose the method that suits your preference. The mxbench tool is located at bin/mxbench under the YMatrix installation directory.

3 Deployment Architecture

Single-machine deployment: Master + 6 Segments

Note!
You must deploy MatrixGate and MatrixBench on the same machine as the YMatrix cluster.

4 Start Testing

4.1 Test Cases

We provide three test cases with varying metric scales to simulate different time-series scenarios. At the end of this section, we will visually compare YMatrix's write performance across these different metric scales:

  1. 100,000 devices, 10 metrics
  2. 100,000 devices, 100 metrics
  3. 100,000 devices, 1,000 metrics

4.2 Begin Testing

Note!
Before using mxbench for load testing, ensure your test environment is ready: a running YMatrix cluster and properly configured environment variables. This step is mandatory! No manual setup is required for mxgate—it is automatically configured and started when you launch mxbench.

For cluster deployment instructions, refer to Cluster Deployment. For mxbench configuration, see mxbench.

4.2.1 100,000 Devices, 10 Metrics

mxbench configuration parameters are divided into two parts: global settings and pluggable component settings. Global settings include the database and global sections; pluggable settings cover the generator, writer, and benchmark tools.

Since this section focuses on evaluating YMatrix’s loading performance, we will not elaborate on the benchmark query tool here.

For details about benchmark, see mxbench.

The following table describes key parameters:

Parameter Name Default Value Description
--database Value of environment variable PGDATABASE, otherwise postgres Target database name
--db-master-port Instance port number; must match the value in environment variables
--db-user Current username Database user
--workplace /tmp/mxbench Directory for CSV data files and query files
--watch true Enable process monitoring (default: enabled)
--simultaneous-loading-and-query false Load and query simultaneously. Default false: load first, then query
--table-name Target table name (required, must be specified manually)
--tag-num 25000 Number of devices
--metrics-type float8 Metric data type. Supported: "int4", "int8", "float4", "float8"
--total-metrics-count 300 Total number of metrics
--ts-start Start timestamp for generated data
--ts-end End timestamp for generated data
--ts-step-in-second 1 Interval between metric samples (seconds)
--generator telematics Random data generator. Defaults to generating telematics scenario data. Can also read from custom data files or skip generation
--generator-batch-size 1 Number of rows per device per timestamp. Default 1 means no splitting
--generator-disorder-ratio 0 Percentage of out-of-order (delayed) data. Range: 0–100. Default 0 (no delay). Use to simulate real-world latency
--generator-empty-value-ratio 90 Percentage of null values per row. Range: 0–100. Default 90 (90% nulls), simulating sparse time-series data
--generator-randomness OFF Data randomness level: OFF/S/M/L. Default OFF (constant values). Increases randomness from S to L
--writer http Data writer mode. Determines how mxgate receives data

For more configurable parameters, visit mxbench or run mxbench --help in the command line.

Use the following command to configure and run mxbench. Adjust parameter values according to your environment. Since mxgate and mxbench are co-located, using the "stdin" writer avoids network overhead by leveraging Linux pipes—lightweight and efficient.

[mxadmin@mdw ~]$ mxbench run \
  --db-database "load_test" \
  --db-database "load_test" \
  --db-master-port 5432 \
  --db-master-host "mdw" \
  --db-user "mxadmin" \
  --workspace "/tmp/mxbench/workspace" \
  --watch \
  --simultaneous-loading-and-query \
  --table-name "test_table" \
  --tag-num 100000 \
  --metrics-type "float8" \
  --total-metrics-count 10 \
  --ts-start "2022-04-19 00:00:00" \
  --ts-end "2022-04-19 00:01:00" \
  --generator "telematics" \
  --generator-batch-size 1 \
  --generator-disorder-ratio 0 \
  --generator-empty-value-ratio 0 \
  --generator-randomness "OFF" \
  --writer "stdin" 

If --watch is not disabled, progress updates appear every 5 seconds. Upon completion, you’ll see output like:

┌───────────────────────────────────────────────────────┐
│             Summary Report for STDIN Writer            │
├─────────────────────────────────┬─────────────────────┤
│ start time:                     │ 2022-07-21 15:14:08 │
├─────────────────────────────────┼─────────────────────┤
│ stop time:                      │ 2022-07-21 15:14:27 │
├─────────────────────────────────┼─────────────────────┤
│ size written to MxGate (bytes): │ 695333400           │
├─────────────────────────────────┼─────────────────────┤
│ lines inserted:                 │ 6000000             │
├─────────────────────────────────┼─────────────────────┤
│ compress ratio:                 │ 5.399120 : 1        │
└─────────────────────────────────┴─────────────────────┘  

Writer report explanation:

Parameter Name Description
start time Data loading start time
stop time Data loading end time
size written to MxGate (bytes) Total bytes sent to mxgate
lines inserted Number of data rows inserted
compress ratio Compression ratio: size written to mxgate vs. actual table size in database

Actual runtime depends on total data volume and machine performance. As long as watch is enabled, you’ll get real-time progress every 5 seconds, allowing you to monitor write speed and duration.

Note!
mxbench runs continuously until all data from ts-start to ts-end is loaded. You can press Ctrl+C to terminate early.

If writing long command lines feels cumbersome, create a config file mxbench.conf, place your parameters inside, and run:

[mxadmin@mdw ~]$ mxbench --config mxbench.conf

Note!
You might encounter a "hang" during data loading—progress logs keep printing but no real progress occurs. Stay calm. Run the following command to check logs and troubleshoot: cd ~/gpAdminLogs/.

4.2.2 100,000 Devices, 100 Metrics

[mxadmin@mdw ~]$ mxbench run \
  --db-database "load_test" \
  --db-master-port 5432 \
  --db-master-host "mdw" \
  --db-user "mxadmin" \
  --workspace "/tmp/mxbench/workspace" \
  --watch \
  --simultaneous-loading-and-query \
  --table-name "test_table2" \
  --tag-num 100000 \
  --metrics-type "float8" \
  --total-metrics-count 100 \
  --ts-start "2022-04-19 00:00:00" \
  --ts-end "2022-04-19 00:01:00" \
  --generator "telematics" \
  --generator-batch-size 1 \
  --generator-disorder-ratio 0 \
  --generator-empty-value-ratio 0 \
  --generator-randomness "OFF" \
  --writer "stdin" 

Output upon completion:

┌───────────────────────────────────────────────────────┐
│             Summary Report for STDIN Writer            │
├─────────────────────────────────┬─────────────────────┤
│ start time:                     │ 2022-07-21 15:19:48 │
├─────────────────────────────────┼─────────────────────┤
│ stop time:                      │ 2022-07-21 15:21:02 │
├─────────────────────────────────┼─────────────────────┤
│ size written to MxGate (bytes): │ 5555333400          │
├─────────────────────────────────┼─────────────────────┤
│ lines inserted:                 │ 6000000             │
├─────────────────────────────────┼─────────────────────┤
│ compress ratio:                 │ 25.519937 : 1       │
└─────────────────────────────────┴─────────────────────┘

4.2.3 100,000 Devices, 1,000 Metrics

[mxadmin@mdw ~]$ mxbench run \
  --db-database "load_test" \
  --db-master-port 5432 \
  --db-master-host "mdw" \
  --db-user "mxadmin" \
  --workspace "/tmp/mxbench/workspace" \
  --watch \
  --simultaneous-loading-and-query \
  --table-name "test_table3" \
  --tag-num 100000 \
  --metrics-type "float8" \
  --total-metrics-count 1000 \
  --ts-start "2022-04-19 00:00:00" \
  --ts-end "2022-04-19 00:01:00" \
  --generator "telematics" \
  --generator-batch-size 1 \
  --generator-disorder-ratio 0 \
  --generator-empty-value-ratio 0 \
  --generator-randomness "OFF" \
  --writer "stdin" 

Output upon completion:

┌───────────────────────────────────────────────────────┐
│             Summary Report for STDIN Writer            │
├─────────────────────────────────┬─────────────────────┤
│ start time:                     │ 2022-07-21 15:22:27 │
├─────────────────────────────────┼─────────────────────┤
│ stop time:                      │ 2022-07-21 15:33:40 │
├─────────────────────────────────┼─────────────────────┤
│ size written to MxGate (bytes): │ 54305333400         │
├─────────────────────────────────┼─────────────────────┤
│ lines inserted:                 │ 6000000             │
├─────────────────────────────────┼─────────────────────┤
│ compress ratio:                 │ 47.488209 : 1       │
└─────────────────────────────────┴─────────────────────┘

Based on the above writer reports, we provide a clear line chart comparison below. It illustrates YMatrix’s strong data loading performance and how it scales with increasing metric counts. This insight helps you make informed decisions about metric count in real-world deployments.

load_performance line

Time-series data consists of timestamped data points—each representing a metric value at a specific moment. Without timestamps, it cannot be considered time-series data. Understanding data points helps interpret the chart above.

The x-axis shows different metric scales (total-metrics-count). The y-axis shows write throughput, i.e., number of data points written per second. As metric count increases, write throughput grows rapidly—but growth slows with higher metric counts. Regardless, YMatrix maintains million-level write speeds. Instead of painfully inserting rows one by one with INSERT, why not take YMatrix for a high-speed ride?