This document introduces the basic usage of MatrixBench and includes the following sections:
Note!
mxbench is now open source. We welcome your review and contributions. Please click here to read the README.
You need a running YMatrix cluster.
MatrixBench requires the use of commands such as createdb, gpconfig, and gpstop. You must configure the relevant environment variables so these commands can execute properly.
Specifically, run source <YMatrix Installation directory>/greenplum_path.sh and set the following environment variables correctly:
Additionally, test the execution of commands such as createdb mxbench, gpconfig -s log_rotation_size, and gpstop -rai to ensure they run correctly.
You must use MatrixGate (shortened as mxgate) for data ingestion. mxgate is a high-performance streaming data loading server located at bin/mxgate under the YMatrix installation directory. For more information, see mxgate.
To quickly try MatrixBench on a personal development machine, you can run it using either a configuration file or command-line arguments.
Note!
mxbench supports multiple data types, features, and composite query statements. The examples below do not cover these advanced functions. For details, refer to Basic Features.
Use the following configuration file, name it mxbench.conf, and run mxbench --config mxbench.conf.
Note!
Thebenchmark-parallelparameter should match your machine's capabilities. It is recommended to set it less than or equal to the number of CPU cores.
[database]
db-database = "testdb1"
db-master-port = 5432
[global]
# Skip the prompt for resetting GUCs. If set to true, you are assumed to have already configured database parameters; mxbench will not modify them.
skip-set-gucs = true
table-name = "table1"
# Directory for generated DDL, parameter recommendations, query statements, etc.
workspace = "/tmp/mxbench"
table-name = "table1"
[benchmark]
benchmark = "telematics"
[benchmark.telematics]
# Array: query concurrency levels
benchmark-parallel = [8]
# Available queries: latest value for single tag, latest values for 10 tags, detailed data for single tag
benchmark-run-query-names = ["SINGLE_TAG_DETAIL_QUERY"]
# Number of times or duration for each query per round. Set benchmark-run-times to 0 to enable runtime:
benchmark-run-times = 0
benchmark-runtime-in-second = "30"
You can also run MatrixBench from the command line. The following command is equivalent to running MatrixBench with the configuration file above.
mxbench run \
--db-database "testdb1" \
--db-master-host "localhost" \
--db-master-port 5432 \
--db-user "mxadmin" \
--skip-set-gucs \
--table-name "table1" \
--benchmark "telematics" \
--benchmark-run-query-names "SINGLE_TAG_DETAIL_QUERY" \
--benchmark-parallel 8 \
--benchmark-run-times 0 \
--benchmark-runtime-in-second 30
This section provides examples of running MatrixBench using configuration files and command-line arguments.
Two typical scenarios are provided:
An ultra-wide sparse table has many columns (metrics), but most values in each row are NULL. This is common in scenarios requiring numerous metrics for different analytical models.
A mixed workload means data ingestion and querying occur simultaneously. Data ingestion is performed via mxgate.
# Example File 1
[Database]
db-database = "testdb2"
db-master-port = 5432
[Global]
# Enable progress monitoring (default is true)
watch = true
# Directory for generated DDL, best-practice parameter suggestions, query statements, etc.
workspace = "/home/mxadmin/mxbench/workspace"
# Whether data loading and querying occur simultaneously
simultaneous-loading-and-query = true
table-name = "table2"
# Number of devices (tags)
tag-num = 20000
# Metric data type: supports int4, int8, float4, float8
metrics-type = "float8"
# Total number of metrics. If greater than 998, the first 997 are simple columns;
# the rest are stored in JSON format in a column named 'ext'
total-metrics-count = 5000
# Start timestamp for data generation. ts-end must be later than ts-start, or an error occurs.
ts-start = "2022-04-19 00:00:00"
ts-end = "2022-04-19 00:01:00"
[Generator]
generator = "telematics"
[generator.telematics]
# Number of records per device per timestamp. These are upserted into one tuple in the database.
generator-batch-size = 1
# Percentage of out-of-order data (1–100), with timestamps moved back by 1 hour
generator-disorder-ratio = 0
# Percentage of NULL values in generated data (1–100)
generator-empty-value-ratio = 90
# Data randomness level: OFF / S / M / L (default: OFF)
generator-randomness = "OFF"
[Writer]
writer = "stdin"
[Benchmark]
benchmark = "telematics"
[benchmark.telematics]
# Array: query concurrency levels
benchmark-parallel = [64]
# Available queries: single-tag latest, multi-tag latest, single-tag detail
benchmark-run-query-names = [ "SINGLE_TAG_LATEST_QUERY", "MULTI_TAG_LATEST_QUERY", "SINGLE_TAG_DETAIL_QUERY" ]
# Number of times or duration for each query per round. Set to 0 to use runtime:
benchmark-run-times = 0
benchmark-runtime-in-second = "60"
This scenario reads a DDL file from a user-specified path to create a table, then loads data from a CSV file. This increases test flexibility. Use this example if you already have a defined DDL and prepared test data.
No queries are executed, so there is no mixed workload.
# Example File 2
[Database]
db-database = "testdb3"
db-master-port = 5432
[Generator]
# Read data from CSV file
generator = "file"
[generator.file]
generator-file-paths = ["/home/mxadmin/mxbench/data.csv"]
[Global]
table-name = "table3"
watch = true
workspace = "/home/mxadmin/mxbench/workspace"
ddl-file-path = "/home/mxadmin/mxbench/ddl.sql"
[Writer]
writer = "stdin"
[Benchmark]
benchmark = "nil"
Two typical scenarios are demonstrated using command-line arguments.
An ultra-wide sparse table has many columns but sparse data per row. A mixed workload involves concurrent data loading and querying via mxgate.
Running MatrixBench with Example Configuration File 1 is equivalent to the following command:
$ mxbench run \
--db-database "testdb2" \
--db-master-port 5432 \
--db-user "mxadmin" \
--workspace "/home/mxadmin/mxbench/workspace" \
--simultaneous-loading-and-query \
--table-name "table2" \
--tag-num 25000 \
--metrics-type "float8" \
--total-metrics-count 5000 \
--ts-start "2022-04-19 00:00:00" \
--ts-end "2022-04-19 00:01:00" \
--generator "telematics" \
--generator-batch-size 1 \
--generator-disorder-ratio 0 \
--generator-empty-value-ratio 90 \
--generator-randomness "OFF" \
--writer "stdin" \
--benchmark "telematics" \
--benchmark-run-query-names "SINGLE_TAG_LATEST_QUERY" \
--benchmark-run-query-names "MULTI_TAG_LATEST_QUERY" \
--benchmark-run-query-names "SINGLE_TAG_DETAIL_QUERY" \
--benchmark-parallel 64 \
--benchmark-run-times 0 \
--benchmark-runtime-in-second 60
Create a table from an external DDL file and load data from a CSV file. This approach increases test flexibility. Use this example if you have a predefined DDL and test data ready.
Running MatrixBench with Example Configuration File 2 is equivalent to the following command:
$ mxbench run \
--db-database "testdb3" \
--db-master-port 5432 \
--workspace "/home/mxadmin/mxbench/workspace" \
--ddl-file-path "/home/mxadmin/mxbench/ddl.sql" \
--table-name "table3" \
--generator "file" \
--generator-file-paths "/home/mxadmin/mxbench/data.csv" \
--writer "stdin" \
--benchmark "nil"
Write-only, no query
Set benchmark to nil.
Query-only, no write
Set generator to nil.
Run data loading and querying simultaneously
Set simultaneous-loading-and-query to true in the Global configuration.
Generate and export CSV data files
Set dump to true in Global. The generated files are located in the unix-timestamp subdirectory under the workspace directory.
View generated DDL and query statements
Check the unix-timestamp subdirectory under the workspace directory.
Use a custom DDL
Specify the absolute path to the DDL file in ddl-file-path under Global settings.
Use custom query statements
Provide your custom queries in benchmark-custom-queries under the telematics benchmark, enclosed in double quotes (""). Random parameters are not supported.
Keep existing database parameters instead of applying recommended ones
When MatrixBench detects discrepancies between current and recommended parameters, it prompts whether to reset and restart the database. Enter N to retain existing parameters. MatrixBench will then confirm whether to proceed. Enter Y to continue.
What are the parameter validity requirements?
In the Global configuration:
ts-end must be later than ts-start;table-name and schema-name must not be empty;tag-num must be greater than 0;ts-step-in-second must not be 0.Note!
For complete command-line options, see MatrixBench Command-Line Parameters.
For main features, see MatrixBench Main Features.
For details on progress tracking and statistical reports, see Understanding MatrixBench Progress and Reports.