MxBench is a stress measurement tool for MatrixDB data loading and query. It can quickly generate random data based on the number of devices, time ranges, and number of indicators specified by the user, automatically create data tables, and perform data loading and querying in serial or concurrently. Users can flexibly configure indicator types, acquisition frequency, null rate, randomness, etc., and can also specify the number of threads for query, customized query statements, etc. MxBench can be run through the command line or through a configuration file. This tool is located in bin/mxbench in the MatrixDB installation directory.
A properly running MatrixDB cluster is required.
Since MxBench needs to call createdb, gpconfig, and gpstop, the user needs to configure relevant environment variables so that these commands can be executed correctly.
Specifically, you need to execute source <MatrixDB installation directory>/greenplum_path.sh
, and the following environment variables need to be correctly set:
In addition, users can also try to execute commands such as createdb mxbench
, gpconfig -s log_rotation_size
, gpstop -rai
to ensure that they can run correctly.
If you want to quickly try MxBench on your personal development machine, you can run MxBench using configuration files or command lines.
You can use the following configuration file, named mxbench.conf
and run:
mxbench --config mxbench.conf
.
Note that the setting of the benchmark-parallel
parameter needs to adapt to machine performance, and it is recommended that the number of CPU cores is less than or equal.
[database]
db-database = "testdb1"
db-master-port = 5432
[global]
# Turn on progress viewing function, the default is true
watch = true
# The generated DDL, GUCs settings suggestions, and the storage directory of Query and other files
workspace = "/tmp/mxbench"
table-name = "table1"
# Number of equipment
tag-num = 25000
# Metric data type, supporting four types: int4, int8, float4, float8
metrics-type = "float8"
# Number of indicators. If the number of indicators is greater than 998, the first 997 are used as simple columns.
# The other ones are stored as json in a column named ext
total-metrics-count = 300
# The start timestamp of the generated data, ts-end must be later than ts-start, otherwise an error will be reported.
ts-start = "2022-04-19 09:00:00"
ts-end = "2022-04-19 09:03:00"
# Collect indicators every few seconds
ts-step-in-second = "2"
[generator]
generator = "telematics"
[generator.telematics]
# The indicators of each device at each time point are uploaded in several pieces of data, and finally they are upsert into a tuple in the DB.
generator-batch-size = 1
# The generation ratio of delayed uploaded data (1~100), push the timestamp forward for 1 hour
generator-disorder-ratio = 0
# The null rate of generated data (1~100)
generator-empty-value-ratio = 90
# The randomness of the generated data is several OFF/S/M/L. The default is "OFF". You can select M in general scenarios.
generator-randomness = "OFF"
[writer]
writer = "stdin"
[benchmark]
benchmark = "telematics"
[benchmark.telematics]
# Array, query concurrency
benchmark-parallel = [8]
# 3 queries provided: latest value of bicycle, latest value of 10 vehicles, and detailed bicycles
benchmark-run-query-names = ["SINGLE_TAG_DETAIL_QUERY" ]
# The number of times or time of running each query in each round, to make the time take effect, you need to set the number of times to 0, as follows:
benchmark-run-times = 0
benchmark-runtime-in-second = "30"
You can also run MxBench using the command line. Running the following command is equivalent to running MxBench in the above configuration file.
mxbench run \
--db-database "testdb1" \
--db-master-host "localhost" \
--db-master-port 5432 \
--db-user "mxadmin" \
--workspace "/tmp/mxbench" \
--table-name "table1" \
--ts-start "2022-04-19 09:00:00" \
--ts-end "2022-04-19 09:02:00" \
--ts-step-in-second 2 \
--generator "telematics" \
--writer "stdin" \
--benchmark "telematics" \
--benchmark-run-query-names "SINGLE_TAG_DETAIL_QUERY" \
--benchmark-parallel 8 \
--benchmark-run-times 0 \
--benchmark-runtime-in-second 30
As can be seen from the sample configuration file, the configuration file is divided into the following sections:
Global configuration
Plug-in configuration
The following are detailed explanations of each module:
[database]
# If the database name does not exist, MxBench will be automatically created; if it already exists, there will be no errors.
db-database = "postgres"
## hostname of the host where the MatrixDB master instance is located
db-master-host = "localhost"
## Port number of the MatrixDB master instance
db-master-port = 5432
# MatrixDB User Password
db-password = ""
# MatrixDB Username
db-user = "mxadmin"
[global]
# MxBench generates DDL files, existing system GUCs recovery scripts, best practice recommended GUCs change scripts,
# csv data file, query file directory.
# If it does not exist, MxBench will be created automatically; if it already exists and is not a directory, an error will be reported. You may need to pay attention to permission issues.
# Every time you run MxBench, a directory named Unix timestamp is created under it, and the files generated by that run are in that directory.
# Default is "/tmp/mxbench"
workspace = "/tmp/mxbench"
# Whether to dump the csv data file, optionally true or false.
# The default is false, and DDL, data loading, and query will be performed.
# In the directory with Unix timestamps under workspace, DDL, GUCs-related scripts and query files will be generated.
# 1. mxbench_ddl.sql: DDL file;
# 2. mxbench_gucs_setup.sh: Best practice recommended GUCs setup script, you may need to restart MatrixDB before all of them take effect.
# 3. mxbench_gucs_backup.sh: The existing system GUCs backup. If the GUCs of existing systems are consistent with the GUCs of best practices, they will not be generated.
# 4. mxbench_<benchmark-plugin>_query.sql: The query statement generated by the corresponding benchmark plugin.
# If true is selected, DDL, data loading and query will not be actually performed; in addition to the above files, it will also generate:
# 5. mxbench_<generator-plugin>_data.csv: The csv file of data generated by the corresponding generator plugin.
dump = false
# Whether to enable process observation, you can choose true or false.
#(Default) If true is selected, the execution progress information of the writer and benchmark modules will be printed every 5 seconds.
# Select false to close.
watch = true
# Whether to perform data loading and query at the same time, optionally true or false.
# If true is selected: perform a mixed load of data loading and querying
# After query is finished, it will run again until the data loading is over.
# (Default) If false is selected: execute data loading first, and then execute query after execution.
simultaneous-loading-and-query = false
# If you need to customize DDL, fill in the path of the DDL file in this parameter.
# (Default) If you do not fill in, DDL will be generated based on other related configurations.
ddl-file-path = ""
# Type of indicator. The default is "float8", which is a double-precision floating point number.
# Only "int4", "int8", "float4", "float8", and 4 types are supported.
metrics-type = "float8"
# The schema name, defaults to "public".
schema-name = "public"
# Data table name. The default is "", which must be set manually.
# If the table with the same name exists in the configured database or schema, an error will be reported and the MxBench program will be terminated.
table-name = "test_table"
# Number of equipment. Default is 25000.
tag-num = 25000
# Total number of indicators. Default is 300.
# If the total number of indicators is greater than 998, the exceeded indicators will be stored in a column named "ext" as json type.
# For example, if the total number of indicators is set to 1000, 998 will be stored in a column named c0~c997 in a simple column.
# The other 2 are placed in the json type column of ext in json form.
total-metrics-count = 300
# Generate data timestamp start time. Because there is delayed reporting of data, the data generated may be earlier than this time.
ts-start = "2022-04-25 09:00:00"
# Generate data timestamp termination time. ts-end must be later than ts-start, otherwise an error will be reported
ts-end = "2022-04-25 09:01:00"
# The indicators are collected every few seconds. Default is "1".
ts-step-in-second = "1"
To generate the data of the Internet of Vehicles scenario, select generator="telematics".
[generator]
generator = "telematics"
[generator.telematics]
# Each device has several metrics uploaded to MatrixDB in each time stamp.
# For example, if set to 5, the device named "tag1" is under the timestamp "2022-04-25 09:00:03",
# The data of each indicator will be transmitted to MatrixDB in 5 pieces and will eventually be merged into 1 tuple.
# The default is 1. That is, the indicator information is not split.
generator-batch-size = 1
# The proportion of delayed reporting of data. Values 1 to 100.
# The default value is 0, which means there is no delayed reporting of data.
generator-disorder-ratio = 0
# The null rate of each row of data. The value is 1~100. The default is 90%, that is, 90% of the indicators will be empty.
generator-empty-value-ratio = 90
# The randomness of the index data is divided into four levels: OFF/S/M/L. The default is OFF.
generator-randomness = "OFF"
Read data from the csv file and load it, select generator="file".
[generator]
generator = "file"
[generator.file]
# The absolute path to data csv data. It can receive an array, that is, upload multiple csv files.
generator-file-paths = []
# In the same as [generator.telematics], setting these two parameters is not required.
# But proper settings will help us better generate DDL statements. If you select a custom DDL statement, it will not work.
generator-batch-size = 1
generator-empty-value-ratio = 90
To not generate or load any data, select generator="nil".
[generator]
generator = "nil"
Start MxGate in http form and load data.
[writer]
writer = "http"
[writer.http]
# Whether to use gzip compression to send http messages, it does not use default
writer-use-gzip = false
# The concurrency of sending data to MxGate.
writer-parallel = 8
Start MxGate in stdin mode and load data.
[writer]
writer = "stdin"
Do not start MxGate and do not write data.
[writer]
writer = "nil"
[benchmark]
benchmark = "telematics"
[benchmark.telematics]
# The query name provided by telematics to be executed sequentially,
# Now available:
# 1. "SINGLE_TAG_LATEST_QUERY"
# 2. "MULTI_TAG_LATEST_QUERY"
# 3. "SINGLE_TAG_DETAIL_QUERY"
# There are 3 legal query names in total, namely:
# "SINGLE_TAG_LATEST_QUERY": Get the values of each indicator of the bicycle's most recent timestamp;
# "MULTI_TAG_LATEST_QUERY": Randomly select 10 cars to obtain the various indicator values of their most recent timestamps;
# "SINGLE_TAG_DETAIL_QUERY": Get the values of each indicator of the bicycle over a period of time.
# Note: For ultra-wide tables, there are many indicators, and DBMS may not support getting all indicator values at once.
# Therefore, the following values are taken from the number of indicators obtained by the parameter debugging and the time period of "SINGLE_TAG_DETAIL_QUERY".
# For example, enter [ "SINGLE_TAG_LATEST_QUERY", "MULTI_TAG_LATEST_QUERY", "SINGLE_TAG_DETAIL_QUERY" ] to execute the above three query sequentially.
# On this basis, deleting the query name can prevent the corresponding query from being executed. Entering another name will be ignored.
# By default, no preset query is executed.
benchmark-run-query-names = [ "SINGLE_TAG_DETAIL_QUERY" ]
# Custom query, separated using ","
# For example, ["SELECT COUNT(*) from t1", "SELECT MAX(ts) from t1"]
# By default, no custom query is performed.
benchmark-custom-queries = []
# To run the concurrency of a query, you can enter multiple inputs, and execute each query in sequence according to the concurrency, separated by ","
# For example, by entering [1, 8], you can run each query with concurrency of 1 first, and then run each query with concurrency of 8.
# Default is empty.
benchmark-parallel = [8]
# The number of times each query runs under each concurrency is determined by the delay and TPS statistics based on the results of so many runs, and the default is 0.
benchmark-run-times = 0
# The time (seconds) of each query runs under each concurrency, and the delay and TPS statistics are performed based on the results of query execution during this period.
# Only effective if benchmark-run-times is 0. The default is 60, which means that each query runs for 60 seconds at each concurrency.
benchmark-runtime-in-second = "60"
If no query is required, set benchmark to nil.
[benchmark]
benchmark = "nil"
[database]
db-database = "testdb2"
db-master-port = 5432
[global]
# Turn on progress viewing function, the default is true
watch = true
# The directory for the generated DDL, GUCs best practice recommendations, Query and other files storage
workspace = "/home/mxadmin/mxbench/workspace"
# Whether data loading and querying are performed simultaneously
simultaneous-loading-and-query = true
table-name = "table2"
# Number of equipment
tag-num = 20000
# Metric data type, supporting four types: int4, int8, float4, float8
metrics-type = "float8"
# Number of indicators. If the number of indicators is greater than 998, the first 997 are used as simple columns.
# The other ones are stored as json in a column named ext
total-metrics-count = 5000
# The start timestamp of the data generated, ts-end must be later than ts-start, otherwise an error will be reported.
ts-start = "2022-04-19 00:00:00"
ts-end = "2022-04-19 00:01:00"
[generator]
generator = "telematics"
[generator.telematics]
# The indicators of each device at each time point are uploaded in several pieces of data, and finally they are upsert into a tuple in the DB.
generator-batch-size = 1
# The generation ratio of delayed uploaded data (1~100), push the timestamp forward for 1 hour
generator-disorder-ratio = 0
# The null rate of generated data (1~100)
generator-empty-value-ratio = 90
# The randomness of the generated data is several levels, with OFF/S/M/L, and the default "OFF" is turned off.
generator-randomness = "OFF"
[writer]
writer = "stdin"
[benchmark]
benchmark = "telematics"
[benchmark.telematics]
# Array, query concurrency
benchmark-parallel = [64]
# 3 queries provided: latest value of bicycle, latest value of 10 vehicles, and detailed bicycles
benchmark-run-query-names = [ "SINGLE_TAG_LATEST_QUERY", "MULTI_TAG_LATEST_QUERY", "SINGLE_TAG_DETAIL_QUERY" ]
# The number of times or time of running each query in each round, to make the time take effect, you need to set the number of times to 0, as follows:
benchmark-run-times = 0
benchmark-runtime-in-second = "60"
[database]
db-database = "testdb3"
db-master-port = 5432
[generator]
# Read data from csv file
generator = "file"
[generator.file]
generator-file-paths = ["/home/mxadmin/mxbench/data.csv"]
[global]
table-name = "table3"
watch = true
workspace = "/home/mxadmin/mxbench/workspace"
ddl-file-path = "/home/mxadmin/mxbench/ddl.sql"
[writer]
writer = "stdin"
[benchmark]
benchmark = "nil"
Running MxBench using sample configuration file 1 is equivalent to running MxBench using the following command line:
mxbench run \
--db-database "testdb2" \
--db-master-port 5432 \
--db-user "mxadmin" \
--workspace "/home/mxadmin/mxbench/workspace" \
--simultaneous-loading-and-query \
--table-name "table2" \
--tag-num 25000 \
--metrics-type "float8" \
--total-metrics-count 5000 \
--ts-start "2022-04-19 00:00:00" \
--ts-end "2022-04-19 00:01:00" \
--generator "telematics" \
--generator-batch-size 1 \
--generator-disorder-ratio 0 \
--generator-empty-value-ratio 90 \
--generator-randomness "OFF" \
--writer "stdin" \
--benchmark "telematics" \
--benchmark-run-query-names "SINGLE_TAG_LATEST_QUERY" \
--benchmark-run-query-names "MULTI_TAG_LATEST_QUERY" \
--benchmark-run-query-names "SINGLE_TAG_DETAIL_QUERY" \
--benchmark-parallel 64 \
--benchmark-run-times 0 \
--benchmark-runtime-in-second 60
Running MxBench using sample configuration file 2 is equivalent to running MxBench using the following command line:
mxbench run \
--db-database "testdb3" \
--db-master-port 5432 \
--workspace "/home/mxadmin/mxbench/workspace" \
--ddl-file-path "/home/mxadmin/mxbench/ddl.sql" \
--table-name "table3" \
--generator "file" \
--generator-file-paths "/home/mxadmin/mxbench/data.csv" \
--writer "stdin" \
--benchmark "nil"
Only load, no query Set benchmark to nil;
Only query, no loading Set generator to nil;
Loading and querying run at the same time Simultaneous-loading-and-query is true in global settings.
Want to generate and dump the csv data file
In the global setting, dump is true, and the generated file is in the
Want to view the generated ddl and query
The
Want to run custom DDL Fill in the absolute path of the ddl file in ddl-file-path in global settings.
Want to run a custom query Fill in the custom query statement in the benchmark-custom-queries of telematics benchmark and enclose it with "". Random parameters are not supported.
Do not want to use the system recommended GUCs, keep the existing GUCs running mxbench: When MxBench detects that there is inconsistency between existing systems and recommended GUCs, it will prompt in the standard output and ask whether it is necessary to reset the GUCs and start the database. Enter "N" to retain the original GUCs. MxBench will confirm again whether to continue running MxBench. Select "Y" and continue running.
What are the requirements for the legality of parameters? In the global configuration:
Example:
● Stdin Writer Report
● period start: 2022-04-29 10:08:11, end: 2022-04-29 10:08:16, period: 5.000129734s
● count written in total: 637025 rows/ 1500000 rows 42%, 637025 rows in this period
● size written in total: 160878718 bytes/ 360000003 bytes 44%, 160878718 bytes in this period
● size written to MxGate in total: 350075440 bytes, 350075440 bytes in this period
● Telematics Benchmark Report
● stats for query SINGLE_TAG_LATEST_QUERY with parallel 8: progress: 100%
● stats for query MULTI_TAG_LATEST_QUERY with parallel 8: progress: 43%
Note: It consists of two parts in total, namely the progress report of writer and benchamrk.
The execution progress of a query under a certain concurrency parallel parameter. When query is simultaneously in progress with data loading, query will continue until the data loading is over, so it may run multiple rounds in a loop. This progress report only shows the latest round progress report
┌───────────────────────────────────────────────────────┐
│ Summary Report for Stdin Writer │
├─────────────────────────────────┬─────────────────────┤
│ start time: │ 2022-04-27 13:29:01 │
├─────────────────────────────────┼─────────────────────┤
│ stop time: │ 2022-04-27 13:29:58 │
├─────────────────────────────────┼─────────────────────┤
│ size written to MxGate (bytes): │ 848333400 │
├─────────────────────────────────┼─────────────────────┤
│ lines inserted: │ 1500000 │
├─────────────────────────────────┼─────────────────────┤
│ compress ratio: │ 1.561276 : 1 │
└─────────────────────────────────┴─────────────────────┘
Each query will generate a report under each parallel parameter and will be printed out in real time.
┌─────────────────┬───────────────┐
│ Overall Latency │ 29.948370582s │
│ Average Latency │ 13.723304ms │
│ P75 Latency │ 14.35972ms │
│ P50 Latency │ 13.654556ms │
│ P25 Latency │ 12.925912ms │
│ TPS │ 582 │
└─────────────────┴───────────────┘
Summary report:
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Summary Report for Telematics Benchmark │
├─────────────────────┬─────────────────────────────────────┬─────────────────────────────────────┬─────────────────────────────────────┤
│ Parallel\Query Name │ SINGLE_TAG_LATEST_QUERY │ MULTI_TAG_LATEST_QUERY │ SINGLE_TAG_DETAIL_QUERY │
├─────────────────────┼─────────────────────────────────────┼─────────────────────────────────────┼─────────────────────────────────────┤
│ 8 │ ┌─────────────────┬───────────────┐ │ ┌─────────────────┬───────────────┐ │ ┌─────────────────┬───────────────┐ │
│ │ │ Overall Latency │ 30.008625133s │ │ │ Overall Latency │ 36.407016561s │ │ │ Overall Latency │ 29.948370582s │ │
│ │ │ Average Latency │ 23.81758ms │ │ │ Average Latency │ 7.270212105s │ │ │ Average Latency │ 13.723304ms │ │
│ │ │ P75 Latency │ 24.911171ms │ │ │ P75 Latency │ 8.139682559s │ │ │ P75 Latency │ 14.35972ms │ │
│ │ │ P50 Latency │ 23.411248ms │ │ │ P50 Latency │ 7.161237157s │ │ │ P50 Latency │ 13.654556ms │ │
│ │ │ P25 Latency │ 20.899622ms │ │ │ P25 Latency │ 6.642459805s │ │ │ P25 Latency │ 12.925912ms │ │
│ │ │ TPS │ 335 │ │ │ TPS │ 1 │ │ │ TPS │ 582 │ │
│ │ └─────────────────┴───────────────┘ │ └─────────────────┴───────────────┘ │ └─────────────────┴───────────────┘ │
│ │ progress: 100% │ progress: 100% │ progress: 100% │
└─────────────────────┴─────────────────────────────────────┴─────────────────────────────────────┴─────────────────────────────────────┘