MatrixBench Quick Start

This document introduces the basic usage of MatrixBench and includes the following sections:

Environment Preparation
Quick Start
Examples
Frequently Asked Questions (FAQ)

Note!
mxbench is now open source. We welcome your review and contributions. Please click here to read the README.

1 Environment Preparation

1.1 YMatrix Cluster

You need a running YMatrix cluster.

1.2 Environment Variables

MatrixBench requires the use of commands such as createdb, gpconfig, and gpstop. You must configure the relevant environment variables so these commands can execute properly.

Specifically, run source <YMatrix Installation directory>/greenplum_path.sh and set the following environment variables correctly:

PGHOST
PGPORT
PGUSER
PGPASSWORD
PGDATABASE
MASTER_DATA_DIRECTORY

Additionally, test the execution of commands such as createdb mxbench, gpconfig -s log_rotation_size, and gpstop -rai to ensure they run correctly.

1.3 MatrixGate

You must use MatrixGate (shortened as mxgate) for data ingestion. mxgate is a high-performance streaming data loading server located at bin/mxgate under the YMatrix installation directory. For more information, see mxgate.

2 Quick Start

To quickly try MatrixBench on a personal development machine, you can run it using either a configuration file or command-line arguments.

Note!
mxbench supports multiple data types, features, and composite query statements. The examples below do not cover these advanced functions. For details, refer to Basic Features.

2.1 Run with Configuration File

Use the following configuration file, name it mxbench.conf, and run mxbench --config mxbench.conf.

Note!
The benchmark-parallel parameter should match your machine's capabilities. It is recommended to set it less than or equal to the number of CPU cores.

[database]
  db-database = "testdb1"
  db-master-port = 5432

[global]
  # Skip the prompt for resetting GUCs. If set to true, you are assumed to have already configured database parameters; mxbench will not modify them.
  skip-set-gucs = true

  table-name = "table1"

  # Directory for generated DDL, parameter recommendations, query statements, etc.
  workspace = "/tmp/mxbench"

  table-name = "table1"

[benchmark]
  benchmark = "telematics"

  [benchmark.telematics]
    # Array: query concurrency levels
    benchmark-parallel = [8]
    # Available queries: latest value for single tag, latest values for 10 tags, detailed data for single tag
    benchmark-run-query-names = ["SINGLE_TAG_DETAIL_QUERY"]
    # Number of times or duration for each query per round. Set benchmark-run-times to 0 to enable runtime:
    benchmark-run-times = 0
    benchmark-runtime-in-second = "30"

2.2 Run from Command Line

You can also run MatrixBench from the command line. The following command is equivalent to running MatrixBench with the configuration file above.

mxbench run \
  --db-database "testdb1" \
  --db-master-host "localhost" \
  --db-master-port 5432 \
  --db-user "mxadmin" \
  --skip-set-gucs \
  --table-name "table1" \
  --benchmark "telematics" \
  --benchmark-run-query-names "SINGLE_TAG_DETAIL_QUERY" \
  --benchmark-parallel 8 \
  --benchmark-run-times 0 \
  --benchmark-runtime-in-second 30

3 Examples

This section provides examples of running MatrixBench using configuration files and command-line arguments.

3.1 Example Configuration Files

Two typical scenarios are provided:

Generate data for an ultra-wide sparse table and run a mixed workload.
Create a table from an external DDL file, load data from a CSV file, and skip querying.

3.1.1 Ultra-Wide Sparse Table with Mixed Workload

An ultra-wide sparse table has many columns (metrics), but most values in each row are NULL. This is common in scenarios requiring numerous metrics for different analytical models.
A mixed workload means data ingestion and querying occur simultaneously. Data ingestion is performed via mxgate.

  # Example File 1
[Database]
  db-database = "testdb2"
  db-master-port = 5432

[Global]
  # Enable progress monitoring (default is true)
  watch = true

  # Directory for generated DDL, best-practice parameter suggestions, query statements, etc.
  workspace = "/home/mxadmin/mxbench/workspace"

  # Whether data loading and querying occur simultaneously
  simultaneous-loading-and-query = true

  table-name = "table2"

  # Number of devices (tags)
  tag-num = 20000
  # Metric data type: supports int4, int8, float4, float8
  metrics-type = "float8"
  # Total number of metrics. If greater than 998, the first 997 are simple columns;
  # the rest are stored in JSON format in a column named 'ext'
  total-metrics-count = 5000

  # Start timestamp for data generation. ts-end must be later than ts-start, or an error occurs.
  ts-start = "2022-04-19 00:00:00"
  ts-end = "2022-04-19 00:01:00"

[Generator]
  generator = "telematics"

  [generator.telematics]
    # Number of records per device per timestamp. These are upserted into one tuple in the database.
    generator-batch-size = 1
    # Percentage of out-of-order data (1–100), with timestamps moved back by 1 hour
    generator-disorder-ratio = 0
    # Percentage of NULL values in generated data (1–100)
    generator-empty-value-ratio = 90
    # Data randomness level: OFF / S / M / L (default: OFF)
    generator-randomness = "OFF"

[Writer]
  writer = "stdin"

[Benchmark]
  benchmark = "telematics"

  [benchmark.telematics]
    # Array: query concurrency levels
    benchmark-parallel = [64]
    # Available queries: single-tag latest, multi-tag latest, single-tag detail
    benchmark-run-query-names = [ "SINGLE_TAG_LATEST_QUERY", "MULTI_TAG_LATEST_QUERY", "SINGLE_TAG_DETAIL_QUERY" ]
    # Number of times or duration for each query per round. Set to 0 to use runtime:
    benchmark-run-times = 0
    benchmark-runtime-in-second = "60"

3.1.2 Load DDL and CSV from External Paths (No Query)

This scenario reads a DDL file from a user-specified path to create a table, then loads data from a CSV file. This increases test flexibility. Use this example if you already have a defined DDL and prepared test data.
No queries are executed, so there is no mixed workload.

  # Example File 2
[Database]
  db-database = "testdb3"
  db-master-port = 5432

[Generator]
  # Read data from CSV file
  generator = "file"

  [generator.file]
    generator-file-paths = ["/home/mxadmin/mxbench/data.csv"]

[Global]
  table-name = "table3"

  watch = true
  workspace = "/home/mxadmin/mxbench/workspace"
  ddl-file-path = "/home/mxadmin/mxbench/ddl.sql"

[Writer]
  writer = "stdin"

[Benchmark]
  benchmark = "nil"

3.2 Example Command Lines

Two typical scenarios are demonstrated using command-line arguments.

3.2.1 Ultra-Wide Sparse Table with Mixed Workload

An ultra-wide sparse table has many columns but sparse data per row. A mixed workload involves concurrent data loading and querying via mxgate.

Running MatrixBench with Example Configuration File 1 is equivalent to the following command:

$ mxbench run \
  --db-database "testdb2" \
  --db-master-port 5432 \
  --db-user "mxadmin" \
  --workspace "/home/mxadmin/mxbench/workspace" \
  --simultaneous-loading-and-query \
  --table-name "table2" \
  --tag-num 25000 \
  --metrics-type "float8" \
  --total-metrics-count 5000 \
  --ts-start "2022-04-19 00:00:00" \
  --ts-end "2022-04-19 00:01:00" \
  --generator "telematics" \
  --generator-batch-size 1 \
  --generator-disorder-ratio 0 \
  --generator-empty-value-ratio 90 \
  --generator-randomness "OFF" \
  --writer "stdin" \
  --benchmark "telematics" \
  --benchmark-run-query-names "SINGLE_TAG_LATEST_QUERY" \
  --benchmark-run-query-names "MULTI_TAG_LATEST_QUERY" \
  --benchmark-run-query-names "SINGLE_TAG_DETAIL_QUERY" \
  --benchmark-parallel 64 \
  --benchmark-run-times 0 \
  --benchmark-runtime-in-second 60

3.2.2 Load DDL and CSV from External Paths (No Query)

Create a table from an external DDL file and load data from a CSV file. This approach increases test flexibility. Use this example if you have a predefined DDL and test data ready.

Running MatrixBench with Example Configuration File 2 is equivalent to the following command:

$ mxbench run \
  --db-database "testdb3" \
  --db-master-port 5432 \
  --workspace "/home/mxadmin/mxbench/workspace" \
  --ddl-file-path "/home/mxadmin/mxbench/ddl.sql" \
  --table-name "table3" \
  --generator "file" \
  --generator-file-paths "/home/mxadmin/mxbench/data.csv" \
  --writer "stdin" \
  --benchmark "nil"

Frequently Asked Questions (FAQ)

Write-only, no query
Set benchmark to nil.
Query-only, no write
Set generator to nil.
Run data loading and querying simultaneously
Set simultaneous-loading-and-query to true in the Global configuration.
Generate and export CSV data files
Set dump to true in Global. The generated files are located in the unix-timestamp subdirectory under the workspace directory.
View generated DDL and query statements
Check the unix-timestamp subdirectory under the workspace directory.
Use a custom DDL
Specify the absolute path to the DDL file in ddl-file-path under Global settings.
Use custom query statements
Provide your custom queries in benchmark-custom-queries under the telematics benchmark, enclosed in double quotes (""). Random parameters are not supported.
Keep existing database parameters instead of applying recommended ones
When MatrixBench detects discrepancies between current and recommended parameters, it prompts whether to reset and restart the database. Enter N to retain existing parameters. MatrixBench will then confirm whether to proceed. Enter Y to continue.
What are the parameter validity requirements?
In the Global configuration:

ts-end must be later than ts-start;
table-name and schema-name must not be empty;
tag-num must be greater than 0;
ts-step-in-second must not be 0.

Note!
For complete command-line options, see MatrixBench Command-Line Parameters.
For main features, see MatrixBench Main Features.
For details on progress tracking and statistical reports, see Understanding MatrixBench Progress and Reports.

Version Release History

English Русский 简体中文