MxBench

MxBench is a data loading and query benchmarking tool for MatrixDB. It can quickly generate random data, automatically create tables, and perform serial or concurrent data loading and querying based on user-defined configurations such as number of devices, time range, and number of metrics. Users can flexibly configure metric types, collection frequency, null value ratio, randomness level, query thread count, and custom query statements. MxBench can be run via command line or configuration file. The tool is located in the bin/mxbench directory of the MatrixDB installation.

1 Preparation

1.1 MatrixDB Cluster

A running MatrixDB cluster is required.

1.2 Environment Variables

Since MxBench calls commands such as createdb, gpconfig, and gpstop, users must set up the necessary environment variables to ensure these commands execute correctly.

Specifically, run source<MatrixDB安装目录>/greenplum_path.sh``, and properly set the following environment variables:

  • PGHOST
  • PGPORT
  • PGUSER
  • PGPASSWORD
  • PGDATABASE
  • MASTER_DATA_DIRECTORY

Additionally, test-run commands like createdb mxbench, gpconfig -s log_rotation_size, gpstop -rai to confirm they execute successfully.

2 Usage

2.1 Quick Start

To quickly try MxBench on a local development machine, use either a configuration file or command-line interface.

2.1.1 Configuration File

Use the following configuration file, name it mxbench.conf, and run:
mxbench --config mxbench.conf.
Note: The benchmark-parallel parameter should match your machine's capabilities; it is recommended to set it less than or equal to the number of CPU cores.

[database]
  db-database = "testdb1"
  db-master-port = 5432

[global]
  # Enable progress monitoring (default is true)
  watch = true

  # Directory for generated DDL, GUC recommendations, queries, etc.
  workspace = "/tmp/mxbench"

  table-name = "table1"

  # Number of devices
  tag-num = 25000
  # Metric data type; supports int4, int8, float4, float8
  metrics-type = "float8"
  # Total number of metrics; if >998, first 997 are simple columns,
  # remaining stored as JSON in a column named 'ext'
  total-metrics-count = 300

  # Data generation start timestamp; ts-end must be later than ts-start
  ts-start = "2022-04-19 09:00:00"
  ts-end = "2022-04-19 09:03:00"

  # Metrics collected every N seconds
  ts-step-in-second = "2"

[generator]
  generator = "telematics"

  [generator.telematics]
    # Number of rows per device per timestamp uploaded; merged into one tuple in DB
    generator-batch-size = 1
    # Percentage of out-of-order data (1–100), with timestamps moved back by 1 hour
    generator-disorder-ratio = 0
    # Null value ratio in generated data (1–100)
    generator-empty-value-ratio = 90
    # Data randomness level: OFF/S/M/L; default is "OFF"; M is suitable for general scenarios
    generator-randomness = "OFF"

[writer]
  writer = "stdin"

[benchmark]
  benchmark = "telematics"

  [benchmark.telematics]
    # Array of query concurrency levels
    benchmark-parallel = [8]
    # Provided queries: single-tag latest, 10-tag latest, single-tag detail
    benchmark-run-query-names = ["SINGLE_TAG_DETAIL_QUERY"]
    # Number of times or duration each query runs per round; set count to 0 to enable time-based execution
    benchmark-run-times = 0
    benchmark-runtime-in-second = "30"

2.1.2 Command Line

Alternatively, run MxBench using the command line. The following command is equivalent to running MxBench with the above configuration file:

mxbench run \
  --db-database "testdb1" \
  --db-master-host "localhost" \
  --db-master-port 5432 \
  --db-user "mxadmin" \
  --workspace "/tmp/mxbench" \
  --table-name "table1" \
  --ts-start "2022-04-19 09:00:00" \
  --ts-end "2022-04-19 09:02:00" \
  --ts-step-in-second 2 \
  --generator "telematics" \
  --writer "stdin" \
  --benchmark "telematics" \
  --benchmark-run-query-names "SINGLE_TAG_DETAIL_QUERY" \
  --benchmark-parallel 8 \
  --benchmark-run-times 0 \
  --benchmark-runtime-in-second 30

2.2 Configuration Details

As shown in the example configuration file, the configuration is divided into the following sections:

Global configurations:

  1. database: Database-related settings.
  2. global: Table structure, data volume information, and administrative options such as whether to dump data files or actually run tests, and whether to enable progress monitoring.

Plugin-based configurations:

  1. generator: Data generator:

    • telematics (default): Generates data for telematics scenarios.
    • file: Reads data from files.
    • nil: Does not generate any data.
  2. writer: Writes generated data to MatrixDB via MxGate:

    • http (default): Starts MxGate via HTTP and loads data.
    • stdin: Starts MxGate via stdin and loads data.
    • nil: Does not start MxGate or write data.
  3. benchmark: Plugin-based module to generate and execute query statements:

    • telematics (default): Generates and executes common queries in telematics scenarios; supports custom queries.
    • nil: Does not generate or execute any queries.

Below are detailed explanations of parameters in each module.

2.2.1 database

[database]
  # Database name; MxBench creates it if it does not exist; no error if already exists
  db-database = "postgres"

  ## Hostname of the MatrixDB master instance
  db-master-host = "localhost"

  ## Port number of the MatrixDB master instance
  db-master-port = 5432

  # Password for MatrixDB user
  db-password = ""

  # MatrixDB username
  db-user = "mxadmin"

2.2.2 global

[global]

  # Directory for MxBench-generated files including DDL, GUC restore scripts,
  # recommended GUC changes, CSV data files, and query files.
  # MxBench creates this directory if it does not exist; errors if it exists but is not a directory.
  # Permissions may need attention.
  # Each MxBench run creates a subdirectory named with a Unix timestamp under this path.
  # Default is "/tmp/mxbench"
  workspace = "/tmp/mxbench"

  # Whether to dump CSV data files; options are true or false.
  # Default is false: executes DDL, data loading, and queries.
  # In the Unix-timestamp-named subdirectory under workspace, generates:
  # 1. mxbench_ddl.sql: DDL script
  # 2. mxbench_gucs_setup.sh: Recommended GUC settings script (may require restart to take effect)
  # 3. mxbench_gucs_backup.sh: Backup of current system GUCs (not generated if same as recommendations)
  # 4. mxbench_<benchmark-plugin>_query.sql: Queries generated by the benchmark plugin
  # If set to true, does not execute DDL, loading, or queries; additionally generates:
  # 5. mxbench_<generator-plugin>_data.csv: CSV data file from the generator plugin
  dump = false

  # Whether to enable progress monitoring; options are true or false.
  # (Default) If true, prints progress of writer and benchmark modules every 5 seconds.
  # Set to false to disable.
  watch = true

  # Whether to run data loading and querying simultaneously; options are true or false.
  # If true: runs mixed workload of loading and querying.
  # Queries loop until data loading finishes.
  # (Default) If false: runs data loading first, then queries.
  simultaneous-loading-and-query = false

  # Path to a custom DDL file, if needed.
  # (Default) If empty, MxBench generates DDL based on other configurations.
  ddl-file-path = ""

  # Type of metrics. Default is "float8" (double precision).
  # Only supports "int4", "int8", "float4", "float8".
  metrics-type = "float8"

  # Schema name; default is "public".
  schema-name = "public"

  # Table name. Default is ""; must be manually set.
  # If a table with the same name exists in the specified database and schema, MxBench exits with an error.
  table-name = "test_table"

  # Number of devices. Default is 25000.
  tag-num = 25000

  # Total number of metrics. Default is 300.
  # If greater than 998, excess metrics are stored as JSON in a column named "ext".
  # For example, if set to 1000, c0–c997 store 998 metrics as simple columns,
  # and the remaining two are stored as JSON in the "ext" column.
  total-metrics-count = 300

  # Start timestamp for generated data. Generated data may be earlier due to delayed reporting.
  ts-start = "2022-04-25 09:00:00"

  # End timestamp for generated data. ts-end must be later than ts-start, otherwise an error occurs.
  ts-end = "2022-04-25 09:01:00"

  # Interval in seconds between metric collections. Default is "1".
  ts-step-in-second = "1"

2.2.3 generator

2.2.3.1 telematics (default)

Generates telematics scenario data. Use generator="telematics".

[generator]

  generator = "telematics"

  [generator.telematics]

    # Number of rows per device per timestamp sent to MatrixDB.
    # For example, if set to 5, data for device "tag1" at "2022-04-25 09:00:03"
    # is split into 5 rows and upserted into one tuple in the database.
    # Default is 1 (no splitting).
    generator-batch-size = 1

    # Ratio of out-of-order (delayed) data. Value from 1 to 100.
    # Default is 0 (no delayed data).
    generator-disorder-ratio = 0

    # Null value ratio per row. Value from 1 to 100. Default is 90 (90% nulls).
    generator-empty-value-ratio = 90

    # Data randomness level: OFF/S/M/L. Default is OFF.
    generator-randomness = "OFF"
2.2.3.2 file

Reads data from CSV files. Use generator="file".

[generator]

  generator = "file"

  [generator.file]

    # Absolute paths to CSV data files. Accepts an array for multiple files.
    generator-file-paths = []

    # Same as [generator.telematics]; setting these is optional,
    # but proper values help generate better DDL. Has no effect if custom DDL is used.
    generator-batch-size = 1
    generator-empty-value-ratio = 90
2.2.3.3 nil

Does not generate or load any data. Use generator="nil".

[generator]

  generator = "nil"

2.2.4 writer

2.2.4.1 http (default)

Starts MxGate via HTTP and loads data.

[writer]

  writer = "http"

  [writer.http]

    # Whether to use gzip compression when sending HTTP messages. Default is false.
    writer-use-gzip = false

    # Concurrency level when sending data to MxGate.
    writer-parallel = 8
2.2.4.2 stdin

Starts MxGate via stdin and loads data.

[writer]

  writer = "stdin"
2.2.4.3 nil

Does not start MxGate or write data.

[writer]

  writer = "nil"

2.2.5 benchmark

2.2.5.1 telematics (default)
[benchmark]

  benchmark = "telematics"

  [benchmark.telematics]

    # List of query names to execute sequentially from the telematics plugin.
    # Available queries:
    # 1. "SINGLE_TAG_LATEST_QUERY"
    # 2. "MULTI_TAG_LATEST_QUERY"
    # 3. "SINGLE_TAG_DETAIL_QUERY"
    # These correspond to:
    # "SINGLE_TAG_LATEST_QUERY": Get latest metric values for one device.
    # "MULTI_TAG_LATEST_QUERY": Randomly select 10 devices and get their latest metric values.
    # "SINGLE_TAG_DETAIL_QUERY": Get metric values for one device over a time range.
    # Note: For very wide tables, retrieving all metrics at once may not be supported.
    # Parameters below allow tuning the number of metrics and time range for "SINGLE_TAG_DETAIL_QUERY".
    # Example: [ "SINGLE_TAG_LATEST_QUERY", "MULTI_TAG_LATEST_QUERY", "SINGLE_TAG_DETAIL_QUERY" ] runs all three.
    # Remove names to skip corresponding queries. Invalid names are ignored.
    # Default is empty (no queries executed).
    benchmark-run-query-names = [ "SINGLE_TAG_DETAIL_QUERY" ]

    # Custom queries, separated by commas.
    # Example: ["SELECT COUNT(*) FROM t1", "SELECT MAX(ts) FROM t1"]
    # Default is empty (no custom queries).
    benchmark-custom-queries = []

    # Query concurrency levels; multiple values trigger sequential runs at each level, separated by commas.
    # Example: [1, 8] runs queries first at concurrency 1, then at 8.
    # Default is empty.
    benchmark-parallel = [8]

    # Number of times each query runs at each concurrency level; used for latency and TPS statistics.
    # Default is 0.
    benchmark-run-times = 0

    # Duration (in seconds) each query runs at each concurrency level; used for latency and TPS statistics.
    # Only effective when benchmark-run-times is 0. Default is 60.
    benchmark-runtime-in-second = "60"
2.2.5.2 nil

Set benchmark = "nil" if no queries should be executed.

[benchmark]

  benchmark = "nil"

2.3 Example Configuration Files

2.3.1 Generate Data for Wide Sparse Table and Run Mixed Workload

[database]
  db-database = "testdb2"
  db-master-port = 5432

[global]
  # Enable progress monitoring (default is true)
  watch = true

  # Directory for generated DDL, GUC best practices, queries, etc.
  workspace = "/home/mxadmin/mxbench/workspace"

  # Whether to run data loading and querying simultaneously
  simultaneous-loading-and-query = true

  table-name = "table2"

  # Number of devices
  tag-num = 20000
  # Metric data type; supports int4, int8, float4, float8
  metrics-type = "float8"
  # Number of metrics; if >998, first 997 are simple columns,
  # rest stored as JSON in a column named "ext"
  total-metrics-count = 5000

  # Data generation start timestamp; ts-end must be later than ts-start
  ts-start = "2022-04-19 00:00:00"
  ts-end = "2022-04-19 00:01:00"

[generator]
  generator = "telematics"

  [generator.telematics]
    # Number of rows per device per timestamp; merged into one tuple in DB
    generator-batch-size = 1
    # Percentage of delayed data (1–100), timestamp shifted back by 1 hour
    generator-disorder-ratio = 0
    # Null value ratio in generated data (1–100)
    generator-empty-value-ratio = 90
    # Data randomness level: OFF/S/M/L; default is "OFF"
    generator-randomness = "OFF"

[writer]
  writer = "stdin"

[benchmark]
  benchmark = "telematics"

  [benchmark.telematics]
    # Array of query concurrency levels
    benchmark-parallel = [64]
    # Provided queries: single-tag latest, 10-tag latest, single-tag detail
    benchmark-run-query-names = [ "SINGLE_TAG_LATEST_QUERY", "MULTI_TAG_LATEST_QUERY", "SINGLE_TAG_DETAIL_QUERY" ]
    # Number of times or duration each query runs per round; set count to 0 to enable time-based execution
    benchmark-run-times = 0
    benchmark-runtime-in-second = "60"

2.3.2 Load Data from External CSV Using External DDL, No Query Execution

[database]
  db-database = "testdb3"
  db-master-port = 5432

[generator]
  # Read data from CSV file
  generator = "file"

  [generator.file]
    generator-file-paths = ["/home/mxadmin/mxbench/data.csv"]

[global]
  table-name = "table3"
  watch = true
  workspace = "/home/mxadmin/mxbench/workspace"
  ddl-file-path = "/home/mxadmin/mxbench/ddl.sql"

[writer]
  writer = "stdin"

[benchmark]
  benchmark = "nil"

2.4 Example Command Lines

2.4.1 Generate Data for Wide Sparse Table and Run Mixed Workload

Running MxBench with Example Config 1 is equivalent to:

mxbench run \
  --db-database "testdb2" \
  --db-master-port 5432 \
  --db-user "mxadmin" \
  --workspace "/home/mxadmin/mxbench/workspace" \
  --simultaneous-loading-and-query \
  --table-name "table2" \
  --tag-num 25000 \
  --metrics-type "float8" \
  --total-metrics-count 5000 \
  --ts-start "2022-04-19 00:00:00" \
  --ts-end "2022-04-19 00:01:00" \
  --generator "telematics" \
  --generator-batch-size 1 \
  --generator-disorder-ratio 0 \
  --generator-empty-value-ratio 90 \
  --generator-randomness "OFF" \
  --writer "stdin" \
  --benchmark "telematics" \
  --benchmark-run-query-names "SINGLE_TAG_LATEST_QUERY" \
  --benchmark-run-query-names "MULTI_TAG_LATEST_QUERY" \
  --benchmark-run-query-names "SINGLE_TAG_DETAIL_QUERY" \
  --benchmark-parallel 64 \
  --benchmark-run-times 0 \
  --benchmark-runtime-in-second 60

2.4.2 Load Data from External CSV Using External DDL, No Query Execution

Running MxBench with Example Config 2 is equivalent to:

mxbench run \
  --db-database "testdb3" \
  --db-master-port 5432 \
  --workspace "/home/mxadmin/mxbench/workspace" \
  --ddl-file-path "/home/mxadmin/mxbench/ddl.sql" \
  --table-name "table3" \
  --generator "file" \
  --generator-file-paths "/home/mxadmin/mxbench/data.csv" \
  --writer "stdin" \
  --benchmark "nil"

2.5 FAQ

  1. Load only, no query
    Set benchmark = "nil".

  2. Query only, no load
    Set generator = "nil".

  3. Run loading and querying simultaneously
    Set simultaneous-loading-and-query = true in the global section.

  4. Generate and dump CSV data file
    Set dump = true in global; output file is in <unix-timestamp> subdirectory under workspace.

  5. View generated DDL and queries
    Check the <unix-timestamp> subdirectory under the workspace directory.

  6. Use custom DDL
    Specify the absolute path to the DDL file in ddl-file-path under global.

  7. Run custom queries
    Add custom queries in benchmark-custom-queries under telematics benchmark, enclosed in quotes. Parameter randomization is not supported.

  8. Do not apply recommended GUCs; keep existing GUCs
    When MxBench detects differences between current and recommended GUCs, it prompts whether to reset GUCs and restart. Enter "N" to keep original GUCs. MxBench will then confirm whether to proceed. Enter "Y" to continue.

  9. What are the parameter validity requirements?
    In global configuration:

    • ts-end must be later than ts-start.
    • table-name and schema-name must not be empty.
    • tag-num must be greater than 0.
    • ts-step-in-second must not be 0.

3 Understanding Progress Information and Statistics Reports

3.1 Progress Information

Example:

● Stdin Writer Report
  ● period start: 2022-04-29 10:08:11, end: 2022-04-29 10:08:16, period: 5.000129734s

  ● count written in total: 637025 rows/ 1500000 rows 42%, 637025 rows in this period

  ● size written in total: 160878718 bytes/ 360000003 bytes 44%, 160878718 bytes in this period

  ● size written to MxGate in total: 350075440 bytes, 350075440 bytes in this period

● Telematics Benchmark Report
  ● stats for query SINGLE_TAG_LATEST_QUERY with parallel 8: progress: 100%

  ● stats for query MULTI_TAG_LATEST_QUERY with parallel 8: progress: 43%

Explanation: The report consists of two parts—writer and benchmark progress.

3.1.1 writer

  • period start, period end, period: Start time, end time, and duration of the reporting window.
  • count written: Total rows written vs. expected, percentage complete, and rows written in this period.
  • size written: Total bytes written vs. expected, percentage complete, and bytes written in this period.
  • size written to MxGate: Bytes sent to MxGate after text conversion, and bytes sent in this period.

3.1.2 benchmark

Execution progress of a query at a given concurrency level (parallel). When queries run concurrently with data loading, they may loop multiple times before loading completes. This report shows progress for the most recent iteration only.

3.2 Statistics Report

--- SPLIT ---

[TRANSLATION FAILED] 3.2.1 writer

┌───────────────────────────────────────────────────────┐
│            Summary Report for Stdin Writer            │
├─────────────────────────────────┬─────────────────────┤
│ start time:                     │ 2022-04-27 13:29:01 │
├─────────────────────────────────┼─────────────────────┤
│ stop time:                      │ 2022-04-27 13:29:58 │
├─────────────────────────────────┼─────────────────────┤
│ size written to MxGate (bytes): │ 848333400           │
├─────────────────────────────────┼─────────────────────┤
│ lines inserted:                 │ 1500000             │
├─────────────────────────────────┼─────────────────────┤
│ compress ratio:                 │ 1.561276 : 1        │
└─────────────────────────────────┴─────────────────────┘
  • start time: 数据加载起始时间;
  • end time: 数据加载终止时间;
  • size written to MxGate (bytes): 向MxGate写入数据的字节数;
  • lines inserted: 插入数据的条数. 由于upsert可能存在,这一数字可能会高于数据库中实际的数据条数。
  • compress ratio: 压缩比,即向MxGate写入数据的大小和实际数据库中该表的大小的比值。

3.2.2 benchmark

每条query,在每个parallel参数下都会产生一个报告,会实时打印出来。

┌─────────────────┬───────────────┐
│ Overall Latency │ 29.948370582s │
│ Average Latency │   13.723304ms │
│ P75 Latency     │    14.35972ms │
│ P50 Latency     │   13.654556ms │
│ P25 Latency     │   12.925912ms │
│ TPS             │           582 │
└─────────────────┴───────────────┘
  • Pxx 代表xx百分位数的延迟。例如P75是14.35972ms,说明执行query的次数中,有25%延迟高于它,75%低于它. P50即中位数。
  • TPS: 每秒执行query的次数.

汇总报告:

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                Summary Report for Telematics Benchmark                                                │
├─────────────────────┬─────────────────────────────────────┬─────────────────────────────────────┬─────────────────────────────────────┤
│ Parallel\Query Name │ SINGLE_TAG_LATEST_QUERY             │ MULTI_TAG_LATEST_QUERY              │ SINGLE_TAG_DETAIL_QUERY             │
├─────────────────────┼─────────────────────────────────────┼─────────────────────────────────────┼─────────────────────────────────────┤
│ 8                   │ ┌─────────────────┬───────────────┐ │ ┌─────────────────┬───────────────┐ │ ┌─────────────────┬───────────────┐ │
│                     │ │ Overall Latency │ 30.008625133s │ │ │ Overall Latency │ 36.407016561s │ │ │ Overall Latency │ 29.948370582s │ │
│                     │ │ Average Latency │    23.81758ms │ │ │ Average Latency │  7.270212105s │ │ │ Average Latency │   13.723304ms │ │
│                     │ │ P75 Latency     │   24.911171ms │ │ │ P75 Latency     │  8.139682559s │ │ │ P75 Latency     │    14.35972ms │ │
│                     │ │ P50 Latency     │   23.411248ms │ │ │ P50 Latency     │  7.161237157s │ │ │ P50 Latency     │   13.654556ms │ │
│                     │ │ P25 Latency     │   20.899622ms │ │ │ P25 Latency     │  6.642459805s │ │ │ P25 Latency     │   12.925912ms │ │
│                     │ │ TPS             │           335 │ │ │ TPS             │             1 │ │ │ TPS             │           582 │ │
│                     │ └─────────────────┴───────────────┘ │ └─────────────────┴───────────────┘ │ └─────────────────┴───────────────┘ │
│                     │ progress: 100%                      │ progress: 100%                      │ progress: 100%                      │
└─────────────────────┴─────────────────────────────────────┴─────────────────────────────────────┴─────────────────────────────────────┘
  • 每行代表某并发度下各query的执行结果。
  • 每列代表每个query在各个并发度下的执行结果。
  • 如果query集执行了多轮(混合负载的情况下,数据加载未结束,query便会一直执行),则仅展示最后一轮的结果。
  • 如果因为query执行错误或者用户中断执行,进度条会显示当前进度,统计信息是根据已经执行query的做出统计。