MxBench

MxBench is a data loading and query benchmarking tool for MatrixDB. It can quickly generate random data, automatically create tables, and perform serial or concurrent data loading and querying based on user-defined configurations such as number of devices, time range, and number of metrics. Users can flexibly configure metric types, collection frequency, null value ratio, randomness level, query thread count, and custom queries. MxBench can be run via command line or configuration file. The tool is located at bin/mxbench under the MatrixDB installation directory.

1 Preparation

1.1 MatrixDB Cluster

A running MatrixDB cluster is required.

1.2 Environment Variables

Since MxBench calls commands such as createdb, gpconfig, and gpstop, users must set up the necessary environment variables to ensure these commands execute correctly.

Specifically, run source<MatrixDB安装目录>/greenplum_path.sh``, and properly set the following environment variables:

  • PGHOST
  • PGPORT
  • PGUSER
  • PGPASSWORD
  • PGDATABASE
  • MASTER_DATA_DIRECTORY

Additionally, test-run commands like createdb mxbench, gpconfig -s log_rotation_size, gpstop -rai to verify they execute successfully.

2 Usage

2.1 Quick Start

To quickly try MxBench on a local development machine, use either a configuration file or command-line interface.

2.1.1 Configuration File

Use the following configuration file, name it mxbench.conf, and run:
mxbench --config mxbench.conf.
Note: The benchmark-parallel parameter should match machine capability; it is recommended to set it less than or equal to the number of CPU cores.

[database]
  db-database = "testdb1"
  db-master-port = 5432

[global]
  # Enable progress monitoring (default is true)
  watch = true

  # Directory to store generated DDL, GUC recommendations, queries, etc.
  workspace = "/tmp/mxbench"

  table-name = "table1"

  # Number of devices
  tag-num = 25000
  # Metric data type; supported: int4, int8, float4, float8
  metrics-type = "float8"
  # Total number of metrics; if greater than 998, first 997 are simple columns,
  # remaining stored in JSON format in a column named 'ext'
  total-metrics-count = 300

  # Start timestamp for generated data; ts-end must be later than ts-start
  ts-start = "2022-04-19 09:00:00"
  ts-end = "2022-04-19 09:03:00"

  # Interval between metric collections (in seconds)
  ts-step-in-second = "2"

[generator]
  generator = "telematics"

  [generator.telematics]
    # Number of batches per device per timestamp uploaded to DB (merged into one tuple)
    generator-batch-size = 1
    # Percentage of delayed data (1–100), with timestamps moved back by 1 hour
    generator-disorder-ratio = 0
    # Null value ratio in generated data (1–100)
    generator-empty-value-ratio = 90
    # Data randomness level: OFF/S/M/L; default is "OFF"; M is suitable for general scenarios
    generator-randomness = "OFF"

[writer]
  writer = "stdin"

[benchmark]
  benchmark = "telematics"

  [benchmark.telematics]
    # Array of query concurrency levels
    benchmark-parallel = [8]
    # Predefined queries: single-tag latest, 10-tag latest, single-tag detail
    benchmark-run-query-names = ["SINGLE_TAG_DETAIL_QUERY"]
    # Number of times each query runs per round, or duration in seconds
    # To enable runtime, set run-times to 0:
    benchmark-run-times = 0
    benchmark-runtime-in-second = "30"

2.1.2 Command Line

Alternatively, run MxBench using the command line. The following command is equivalent to running MxBench with the above configuration file:

mxbench run \
  --db-database "testdb1" \
  --db-master-host "localhost" \
  --db-master-port 5432 \
  --db-user "mxadmin" \
  --workspace "/tmp/mxbench" \
  --table-name "table1" \
  --ts-start "2022-04-19 09:00:00" \
  --ts-end "2022-04-19 09:02:00" \
  --ts-step-in-second 2 \
  --generator "telematics" \
  --writer "stdin" \
  --benchmark "telematics" \
  --benchmark-run-query-names "SINGLE_TAG_DETAIL_QUERY" \
  --benchmark-parallel 8 \
  --benchmark-run-times 0 \
  --benchmark-runtime-in-second 30

2.2 Configuration Details

As shown in the example configuration file, the configuration is divided into the following sections:

Global Settings

  1. database: Database-related settings;
  2. global: Includes table structure, data volume information, and administrative options such as whether to dump data files or actually run tests, and whether to enable progress monitoring;

Plugin-based Configurations

  1. generator: Data generator:
  • telematics (default) - Generates data for telematics scenarios;
  • file - Reads data from files;
  • nil - Does not generate data.
  1. writer: Writes generated data to MatrixDB via MxGate:
  • http (default) - Starts MxGate via HTTP and loads data;
  • stdin - Starts MxGate via stdin and loads data;
  • nil - Does not start MxGate or load data.
  1. benchmark: Plugin-based module to generate and execute query statements:
  • telematics (default) - Generates and executes common queries in telematics scenarios, supports custom queries;
  • nil - Does not generate or execute any queries.

Below are detailed explanations of parameters in each module:

2.2.1 database

[database]
  # Database name; MxBench creates it if it does not exist; no error if already exists  
  db-database = "postgres"

  ## Hostname where MatrixDB master instance resides
  db-master-host = "localhost"

  ## Port number of MatrixDB master instance
  db-master-port = 5432

  # MatrixDB user password
  db-password = ""

  # MatrixDB username
  db-user = "mxadmin"

2.2.2 global

[global]

  # Directory for MxBench-generated files including DDL, GUC restore scripts,
  # recommended GUC changes, CSV data files, and query files.
  # MxBench creates this directory if it does not exist; errors if path exists but is not a directory.
  # Permissions may need attention.
  # Each MxBench run creates a subdirectory named with Unix timestamp under this path.
  # Default: "/tmp/mxbench"
  workspace = "/tmp/mxbench"

  # Whether to dump CSV data files; options: true or false.
  # Default: false — executes DDL, data loading, and queries.
  # Under the timestamped subdirectory, generates:
  # 1. mxbench_ddl.sql: DDL script;
  # 2. mxbench_gucs_setup.sh: Recommended GUC settings script (may require restart to take effect);
  # 3. mxbench_gucs_backup.sh: Backup of current system GUCs; not generated if identical to recommendations;
  # 4. mxbench_<benchmark-plugin>_query.sql: Queries generated by the benchmark plugin.
  # If set to true, DDL, loading, and queries are not executed; additionally generates:
  # 5. mxbench_<generator-plugin>_data.csv: CSV data file from the generator plugin.
  dump = false

  # Whether to enable progress monitoring; options: true or false.
  # (Default) If true, prints execution progress of writer and benchmark modules every 5 seconds.
  # Set to false to disable.
  watch = true

  # Whether to run data loading and querying simultaneously; options: true or false.
  # If true: Runs mixed workload of loading and querying.
  # Queries loop until data loading finishes.
  # (Default) If false: Loads data first, then runs queries.
  simultaneous-loading-and-query = false

  # Path to custom DDL file.
  # (Default) If empty, DDL is auto-generated based on other configurations.
  ddl-file-path = ""

  # Type of metrics. Default: "float8" (double precision).
  # Only supports "int4", "int8", "float4", "float8".
  metrics-type = "float8"

  # Schema name; default: "public".
  schema-name = "public"

  # Table name. Default: ""; must be manually set.
  # If a table with the same name exists in the specified database and schema, MxBench exits with an error.
  table-name = "test_table"

  # Number of devices. Default: 25000.
  tag-num = 25000

  # Total number of metrics. Default: 300.
  # If greater than 998, excess metrics are stored as JSON in a column named "ext".
  # For example, if set to 1000, c0~c997 hold 998 simple columns,
  # and the remaining two are stored as JSON in the "ext" column.
  total-metrics-count = 300

  # Start timestamp for generated data. Generated data may precede this due to delayed reporting.
  ts-start = "2022-04-25 09:00:00"

  # End timestamp for generated data. ts-end must be later than ts-start, otherwise error.
  ts-end = "2022-04-25 09:01:00"

  # Interval between metric collections (in seconds). Default: "1".
  ts-step-in-second = "1"

2.2.3 generator

2.2.3.1 telematics (default)

Generates telematics scenario data. Use generator="telematics".

[generator]

  generator = "telematics"

  [generator.telematics]

    # Number of batches per device per timestamp sent to MatrixDB.
    # For example, if set to 5, data for device "tag1" at "2022-04-25 09:00:03"
    # is split into 5 rows and merged into one tuple in DB.
    # Default: 1 — no splitting.
    generator-batch-size = 1

    # Ratio of delayed reporting data. Value: 1–100.
    # Default: 0 — no delayed data.
    generator-disorder-ratio = 0

    # Null value ratio per row. Value: 1–100. Default: 90% — 90% of metrics are NULL.
    generator-empty-value-ratio = 90

    # Data randomness level: OFF/S/M/L. Default: OFF.
    generator-randomness = "OFF"
2.2.3.2 file

Reads data from CSV files. Use generator="file".

[generator]

  generator = "file"

  [generator.file]

    # Absolute paths to CSV data files. Accepts an array for multiple files.
    generator-file-paths = []

    # Same as [generator.telematics]; setting these is optional,
    # but helps generate better DDL. Has no effect if custom DDL is used.
    generator-batch-size = 1
    generator-empty-value-ratio = 90
2.2.3.3 nil

Does not generate or load any data. Use generator="nil".

[generator]

  generator = "nil"

2.2.4 writer

2.2.4.1 http (default)

Starts MxGate via HTTP and loads data.

[writer]

  writer = "http"

  [writer.http]

    # Whether to use gzip compression when sending HTTP messages. Default: false.
    writer-use-gzip = false

    # Concurrency level when sending data to MxGate.
    writer-parallel = 8
2.2.4.2 stdin

Starts MxGate via stdin and loads data.

[writer]

  writer = "stdin"
2.2.4.3 nil

Does not start MxGate or write data.

[writer]

  writer = "nil"

2.2.5 benchmark

2.2.5.1 telematics (default)
[benchmark]

  benchmark = "telematics"

  [benchmark.telematics]

    # List of telematics queries to execute sequentially.
    # Available queries:
    # 1. "SINGLE_TAG_LATEST_QUERY" 
    # 2. "MULTI_TAG_LATEST_QUERY" 
    # 3. "SINGLE_TAG_DETAIL_QUERY" 
    # These correspond to:
    # "SINGLE_TAG_LATEST_QUERY": Get latest metric values for one device;
    # "MULTI_TAG_LATEST_QUERY": Randomly select 10 devices and get their latest metric values;
    # "SINGLE_TAG_DETAIL_QUERY": Get metric values for one device over a time range.
    # Note: For very wide tables, retrieving all metrics at once may exceed DBMS limits.
    # Parameters below allow tuning the number of metrics and time range for "SINGLE_TAG_DETAIL_QUERY".
    # Example: [ "SINGLE_TAG_LATEST_QUERY", "MULTI_TAG_LATEST_QUERY", "SINGLE_TAG_DETAIL_QUERY" ] runs all three.
    # Remove names to skip corresponding queries. Invalid names are ignored.
    # Default: empty — no predefined queries executed.
    benchmark-run-query-names = [ "SINGLE_TAG_DETAIL_QUERY" ]

    # Custom queries, separated by commas.
    # Example: ["SELECT COUNT(*) FROM t1", "SELECT MAX(ts) FROM t1"]
    # Default: empty — no custom queries executed.
    benchmark-custom-queries = []

    # Query concurrency levels. Multiple values allowed, executed in order.
    # Example: [1, 8] runs queries first with concurrency 1, then 8.
    # Default: empty.
    benchmark-parallel = [8]

    # Number of times each query runs per concurrency level. Used for latency and TPS statistics.
    # Default: 0.
    benchmark-run-times = 0

    # Duration (seconds) each query runs per concurrency level. Statistics based on executions within this period.
    # Only effective when benchmark-run-times is 0. Default: 60 seconds.
    benchmark-runtime-in-second = "60"
2.2.5.2 nil

Set benchmark = "nil" to skip query execution.

[benchmark]

  benchmark = "nil"

2.3 Example Configuration Files

2.3.1 Generate Data for Wide Sparse Table and Run Mixed Workload

[database]
  db-database = "testdb2"
  db-master-port = 5432

[global]
  # Enable progress monitoring (default is true)
  watch = true

  # Directory for generated DDL, GUC best practices, queries, etc.
  workspace = "/home/mxadmin/mxbench/workspace"

  # Whether to run data loading and querying simultaneously
  simultaneous-loading-and-query = true

  table-name = "table2"

  # Number of devices
  tag-num = 20000
  # Metric data type; supports int4, int8, float4, float8
  metrics-type = "float8"
  # Number of metrics; if >998, first 997 are simple columns,
  # rest stored as JSON in a column named "ext"
  total-metrics-count = 5000

  # Start timestamp for generated data; ts-end must be later than ts-start
  ts-start = "2022-04-19 00:00:00"
  ts-end = "2022-04-19 00:01:00"

[generator]
  generator = "telematics"

  [generator.telematics]
    # Number of batches per device per timestamp; merged into one tuple in DB
    generator-batch-size = 1
    # Ratio of delayed data (1–100), timestamp shifted back by 1 hour
    generator-disorder-ratio = 0
    # Null value ratio in generated data (1–100)
    generator-empty-value-ratio = 90
    # Data randomness level: OFF/S/M/L; default: "OFF"
    generator-randomness = "OFF"

[writer]
  writer = "stdin"

[benchmark]
  benchmark = "telematics"

  [benchmark.telematics]
    # Array of query concurrency levels
    benchmark-parallel = [64]
    # Provided queries: single-tag latest, 10-tag latest, single-tag detail
    benchmark-run-query-names = [ "SINGLE_TAG_LATEST_QUERY", "MULTI_TAG_LATEST_QUERY", "SINGLE_TAG_DETAIL_QUERY" ]
    # Number of runs or duration per query per round; set runs to 0 to enable duration:
    benchmark-run-times = 0
    benchmark-runtime-in-second = "60"

2.3.2 Load Data from External CSV with Custom DDL, No Query Execution

[database]
  db-database = "testdb3"
  db-master-port = 5432

[generator]
  # Read data from CSV file
  generator = "file"

  [generator.file]
    generator-file-paths = ["/home/mxadmin/mxbench/data.csv"]

[global]
  table-name = "table3"
  watch = true
  workspace = "/home/mxadmin/mxbench/workspace"
  ddl-file-path = "/home/mxadmin/mxbench/ddl.sql"

[writer]
  writer = "stdin"

[benchmark]
  benchmark = "nil"

2.4 Example Command Lines

2.4.1 Generate Data for Wide Sparse Table and Run Mixed Workload

Running MxBench with Example Config 1 is equivalent to:

mxbench run \
  --db-database "testdb2" \
  --db-master-port 5432 \
  --db-user "mxadmin" \
  --workspace "/home/mxadmin/mxbench/workspace" \
  --simultaneous-loading-and-query \
  --table-name "table2" \
  --tag-num 25000 \
  --metrics-type "float8" \
  --total-metrics-count 5000 \
  --ts-start "2022-04-19 00:00:00" \
  --ts-end "2022-04-19 00:01:00" \
  --generator "telematics" \
  --generator-batch-size 1 \
  --generator-disorder-ratio 0 \
  --generator-empty-value-ratio 90 \
  --generator-randomness "OFF" \
  --writer "stdin" \
  --benchmark "telematics" \
  --benchmark-run-query-names "SINGLE_TAG_LATEST_QUERY" \
  --benchmark-run-query-names "MULTI_TAG_LATEST_QUERY" \
  --benchmark-run-query-names "SINGLE_TAG_DETAIL_QUERY" \
  --benchmark-parallel 64 \
  --benchmark-run-times 0 \
  --benchmark-runtime-in-second 60

2.4.2 Load Data from External CSV with Custom DDL, No Query Execution

Running MxBench with Example Config 2 is equivalent to:

mxbench run \
  --db-database "testdb3" \
  --db-master-port 5432 \
  --workspace "/home/mxadmin/mxbench/workspace" \
  --ddl-file-path "/home/mxadmin/mxbench/ddl.sql" \
  --table-name "table3" \
  --generator "file" \
  --generator-file-paths "/home/mxadmin/mxbench/data.csv" \
  --writer "stdin" \
  --benchmark "nil" 

2.5 FAQ

  1. Load only, no query
    Set benchmark = "nil".

  2. Query only, no load
    Set generator = "nil".

  3. Run loading and querying simultaneously
    Set simultaneous-loading-and-query = true in global settings.

  4. Generate and dump CSV data file
    Set dump = true in global settings. Output file is in <unix-timestamp> subdirectory under workspace.

  5. View generated DDL and queries
    Check the <unix-timestamp> subdirectory under the workspace directory.

  6. Use custom DDL
    Specify the absolute path to the DDL file in ddl-file-path under global settings.

  7. Run custom queries
    Specify custom queries in benchmark-custom-queries under telematics benchmark, enclosed in quotes. Parameter randomization is not supported.

  8. Skip recommended GUCs and retain existing ones
    When MxBench detects differences between current and recommended GUCs, it prompts and asks whether to reset GUCs and restart the database. Enter "N" to keep original GUCs. MxBench will then confirm whether to proceed. Enter "Y" to continue.

  9. What are the parameter validity requirements?
    In global settings:

    • ts-end must be later than ts-start;
    • table-name and schema-name must not be empty;
    • tag-num must be greater than 0;
    • ts-step-in-second must not be 0.

3 Understanding Progress Information and Statistics Reports

3.1 Progress Information

Example:

● Stdin Writer Report
  ● period start: 2022-04-29 10:08:11, end: 2022-04-29 10:08:16, period: 5.000129734s

  ● count written in total: 637025 rows/ 1500000 rows 42%, 637025 rows in this period

  ● size written in total: 160878718 bytes/ 360000003 bytes 44%, 160878718 bytes in this period

  ● size written to MxGate in total: 350075440 bytes, 350075440 bytes in this period

● Telematics Benchmark Report
  ● stats for query SINGLE_TAG_LATEST_QUERY with parallel 8: progress: 100%

  ● stats for query MULTI_TAG_LATEST_QUERY with parallel 8: progress: 43%

Explanation: Consists of two parts — writer and benchmark progress reports.

3.1.1 writer

  • period start, period end, period: Start time, end time, and duration of the reporting window;
  • count written: Total rows written vs. expected, and percentage completed. "xx in this period": rows written during this window;
  • size written: Total bytes written vs. expected, and percentage completed. "xx in this period": bytes written during this window;
  • size written to MxGate: Bytes sent to MxGate after converting data to text format. "xx in this period": bytes sent during this window.

3.1.2 benchmark

Execution progress of a query at a given concurrency level (parallel). When queries run concurrently with data loading, queries may loop multiple times before loading completes. This report shows progress of the most recent iteration only.

3.2 Statistics Report

--- SPLIT ---

3.2.1 writer

┌───────────────────────────────────────────────────────┐
│            Summary Report for Stdin Writer            │
├─────────────────────────────────┬─────────────────────┤
│ start time:                     │ 2022-04-27 13:29:01 │
├─────────────────────────────────┼─────────────────────┤
│ stop time:                      │ 2022-04-27 13:29:58 │
├─────────────────────────────────┼─────────────────────┤
│ size written to MxGate (bytes): │ 848333400           │
├─────────────────────────────────┼─────────────────────┤
│ lines inserted:                 │ 1500000             │
├─────────────────────────────────┼─────────────────────┤
│ compress ratio:                 │ 1.561276 : 1        │
└─────────────────────────────────┴─────────────────────┘
  • start time: The time when data loading started.
  • stop time: The time when data loading ended.
  • size written to MxGate (bytes): The number of bytes written to MxGate.
  • lines inserted: The number of data rows inserted. Due to upsert operations, this number may exceed the actual row count in the database.
  • compress ratio: Compression ratio, defined as the ratio of data size written to MxGate to the actual table size in the database.

3.2.2 benchmark

For each query, a report is generated per parallel setting and printed in real time.

┌─────────────────┬───────────────┐
│ Overall Latency │ 29.948370582s │
│ Average Latency │   13.723304ms │
│ P75 Latency     │    14.35972ms │
│ P50 Latency     │   13.654556ms │
│ P25 Latency     │   12.925912ms │
│ TPS             │           582 │
└─────────────────┴───────────────┘
  • Pxx represents the latency at the xx-th percentile. For example, a P75 latency of 14.35972ms means that 75% of queries completed faster than this value, while 25% were slower. P50 is the median latency.
  • TPS: Transactions per second, i.e., the number of queries executed per second.

Summary report:

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                Summary Report for Telematics Benchmark                                                │
├─────────────────────┬─────────────────────────────────────┬─────────────────────────────────────┬─────────────────────────────────────┤
│ Parallel\Query Name │ SINGLE_TAG_LATEST_QUERY             │ MULTI_TAG_LATEST_QUERY              │ SINGLE_TAG_DETAIL_QUERY             │
├─────────────────────┼─────────────────────────────────────┼─────────────────────────────────────┼─────────────────────────────────────┤
│ 8                   │ ┌─────────────────┬───────────────┐ │ ┌─────────────────┬───────────────┐ │ ┌─────────────────┬───────────────┐ │
│                     │ │ Overall Latency │ 30.008625133s │ │ │ Overall Latency │ 36.407016561s │ │ │ Overall Latency │ 29.948370582s │ │
│                     │ │ Average Latency │    23.81758ms │ │ │ Average Latency │  7.270212105s │ │ │ Average Latency │   13.723304ms │ │
│                     │ │ P75 Latency     │   24.911171ms │ │ │ P75 Latency     │  8.139682559s │ │ │ P75 Latency     │    14.35972ms │ │
│                     │ │ P50 Latency     │   23.411248ms │ │ │ P50 Latency     │  7.161237157s │ │ │ P50 Latency     │   13.654556ms │ │
│                     │ │ P25 Latency     │   20.899622ms │ │ │ P25 Latency     │  6.642459805s │ │ │ P25 Latency     │   12.925912ms │ │
│                     │ │ TPS             │           335 │ │ │ TPS             │             1 │ │ │ TPS             │           582 │ │
│                     │ └─────────────────┴───────────────┘ │ └─────────────────┴───────────────┘ │ └─────────────────┴───────────────┘ │
│                     │ progress: 100%                      │ progress: 100%                      │ progress: 100%                      │
└─────────────────────┴─────────────────────────────────────┴─────────────────────────────────────┴─────────────────────────────────────┘
  • Each row shows execution results for all queries at a specific concurrency level.
  • Each column shows execution results for a specific query across different concurrency levels.
  • If multiple rounds of query execution are performed (e.g., mixed workload where queries run continuously before data loading finishes), only the results from the last round are displayed.
  • If query execution fails or is interrupted by the user, the progress bar displays the current completion percentage, and statistics are calculated based on completed queries only.