This document introduces the basic usage of MatrixBench, including the following:
Notes!
Currently, mxbench is open source, and your review and contribution are welcome. Please [click here] (https://github.com/ymatrix-data/mxbench/blob/master/README.md) read README.
A properly functioning YMatrix cluster is required.
Since MatrixBench needs to call createdb, gpconfig, and gpstop, you need to configure relevant environment variables in advance so that these commands can be executed correctly.
Specifically, you need to execute source <YMatrix installation directory>/greenplum_path.sh
and correctly set the following environment variables:
In addition, users can also try to execute commands such as createdb mxbench
, gpconfig -s log_rotation_size
, gpstop -rai
to ensure that they can run correctly.
You need to write data through MatrixGate (mxgate for short). mxgate is a high-performance streaming data loading server located in bin/mxgate under the YMatrix installation directory. For more information, please see mxgate.
If you want to quickly try MatrixBench on your personal development machine, you can run MatrixBench using configuration files or command lines.
Note! mxbench supports the implementation of multiple data types and features, and combined query statements. The following example does not include the relevant usage of the above two functions. For more information, please see Basic Functions.
You can use the following configuration file, named mxbench.conf
and run
mxbench --config mxbench.conf
.
Note! The
benchmark-parallel
parameter setting needs to adapt to machine performance, and it is recommended that the number of CPU cores is less than or equal.[database] db-database = "testdb1" db-master-port = 5432
[global]
skip-set-gucs = true
table-name = "table1"
workspace = "/tmp/mxbench"
table-name = "table1"
[benchmark] benchmark = "telematics"
[benchmark.telematics]
benchmark-parallel = [8]
# 3 queries provided: latest value of bicycle, latest value of 10 vehicles, and detailed bicycles
benchmark-run-query-names = ["SINGLE_TAG_DETAIL_QUERY" ]
# The number of times or time of running each Query in each round, to make the time take effect, you need to set the number of times to 0, as follows:
benchmark-run-times = 0
benchmark-runtime-in-second = "30"
### 2.2 Command line run
You can also run MatrixBench using the command line. Running the following command is equivalent to running MatrixBench with the above configuration file.
```bash
mxbench run \
--db-database "testdb1" \
--db-master-host "localhost" \
--db-master-port 5432 \
--db-user "mxadmin" \
--skip-set-gucs \
--table-name "table1" \
--benchmark "telematics" \
--benchmark-run-query-names "SINGLE_TAG_DETAIL_QUERY" \
--benchmark-parallel 8 \
--benchmark-run-times 0 \
--benchmark-runtime-in-second 30
This section gives examples of configuration files and command line running.
In this section we provide example configuration files for two typical scenarios:
Ultra-wide sparse tables refer to tables with a large number of indicators (many columns), but the data in each row is mostly empty. They are common in scenarios where a large number of indicators are required to build different data analysis models. Mixed load means that data writing and query are carried out simultaneously, and data writing is completed through the mxgate tool.
# Sample File 1
[Database]
db-database = "testdb2"
db-master-port = 5432
[Global]
# Turn on progress viewing function, the default is true
watch = true
# The directory where generated DDL, parameter best practice suggestions, query statements and other files are stored
workspace = "/home/mxadmin/mxbench/workspace"
# Whether data writing and query are carried out simultaneously
simultaneous-loading-and-query = true
table-name = "table2"
# Number of equipment
tag-num = 20000
# Metric data type, supporting four types: int4, int8, float4, float8
metrics-type = "float8"
# Number of indicators. If the number of indicators is greater than 998, the first 997 are used as simple columns.
# Others are stored as JSON in a column named ext
total-metrics-count = 5000
# The start timestamp of the data generated, ts-end must be later than ts-start, otherwise an error will be reported.
ts-start = "2022-04-19 00:00:00"
ts-end = "2022-04-19 00:01:00"
[Generator]
generator = "telematics"
[generator.telematics]
# The indicators of each device at each time point are uploaded in several pieces of data, and finally the database is UPSERT into 1 Tuple
generator-batch-size = 1
# The generation ratio of delayed uploaded data (1~100), the timestamp is pushed forward for 1 hour
generator-disorder-ratio = 0
# The null rate of generated data (1~100)
generator-empty-value-ratio = 90
# The randomness of the generated data is several levels, with OFF / S / M / L, and the default "OFF" is turned off
generator-randomness = "OFF"
[Writer]
writer = "stdin"
[Benchmark]
benchmark = "telematics"
[benchmark.telematics]
# Array, query concurrency
benchmark-parallel = [64]
# 3 queries provided: latest value of bicycle, latest value of 10 vehicles, and detailed bicycles
benchmark-run-query-names = [ "SINGLE_TAG_LATEST_QUERY", "MULTI_TAG_LATEST_QUERY", "SINGLE_TAG_DETAIL_QUERY" ]
# The number of times or time of running each query statement in each round, so that the time needs to be effective, set the number of times to 0, as follows:
benchmark-run-times = 0
benchmark-runtime-in-second = "60"
That is, read the DDL file from the user path and complete the table creation, and then read the CSV file and write the data. This example scenario increases the autonomy of mxbench testing. If you already have a certain DDL and have the test data ready, please follow this example to test.
This example does not perform query operations, so there is no mixed loading situation.
# Sample File 2
[Database]
db-database = "testdb3"
db-master-port = 5432
[Generator]
# Read data from a CSV file
generator = "file"
[generator.file]
generator-file-paths = ["/home/mxadmin/mxbench/data.csv"]
[Global]
table-name = "table3"
watch = true
workspace = "/home/mxadmin/mxbench/workspace"
ddl-file-path = "/home/mxadmin/mxbench/ddl.sql"
[Writer]
writer = "stdin"
[Benchmark]
benchmark = "nil"
In this section we provide example run commands for two typical scenarios:
Ultra-wide sparse tables refer to tables with a large number of indicators (many columns), but the data in each row is mostly empty. They are common in scenarios where a large number of indicators are required to build different data analysis models. Mixed load means that data writing and query are carried out simultaneously, and data writing is completed through the mxgate tool.
Using the sample configuration file 1 Running MatrixBench is equivalent to running MatrixBench using the following command line:
$ mxbench run \
--db-database "testdb2" \
--db-master-port 5432 \
--db-user "mxadmin" \
--workspace "/home/mxadmin/mxbench/workspace" \
--simultaneous-loading-and-query \
--table-name "table2" \
--tag-num 25000 \
--metrics-type "float8" \
--total-metrics-count 5000 \
--ts-start "2022-04-19 00:00:00" \
--ts-end "2022-04-19 00:01:00" \
--generator "telematics" \
--generator-batch-size 1 \
--generator-disorder-ratio 0 \
--generator-empty-value-ratio 90 \
--generator-randomness "OFF" \
--writer "stdin" \
--benchmark "telematics" \
--benchmark-run-query-names "SINGLE_TAG_LATEST_QUERY" \
--benchmark-run-query-names "MULTI_TAG_LATEST_QUERY" \
--benchmark-run-query-names "SINGLE_TAG_DETAIL_QUERY" \
--benchmark-parallel 64 \
--benchmark-run-times 0 \
--benchmark-runtime-in-second 60
That is, read the DDL file from the user path and complete the table creation, and then read the CSV file and write the data. This example scenario increases the autonomy of mxbench testing. If you already have a certain DDL and have the test data ready, please follow this example to test.
This example does not perform query operations, so there is no mixed loading situation.
Using the sample configuration file 2 Running MatrixBench is equivalent to running MatrixBench using the following command line:
$ mxbench run \
--db-database "testdb3" \
--db-master-port 5432 \
--workspace "/home/mxadmin/mxbench/workspace" \
--ddl-file-path "/home/mxadmin/mxbench/ddl.sql" \
--table-name "table3" \
--generator "file" \
--generator-file-paths "/home/mxadmin/mxbench/data.csv" \
--writer "stdin" \
--benchmark "nil"
Only write, no query Set Benchmark to nil;
Only query, not write Set Generator to nil;
Write and query are performed simultaneously
simultaneous-loading-and-query
is true in Global settings.
Want to generate and dump the CSV data file
In Global settings, dump is true
, and the generated files are in the unix-timestamp
directory in the directory set by workspace.
Want to view the generated DDL and query statements
The unix-timestamp
directory is in the directory set by the workspace.
Want to run custom DDL
Fill in the absolute path to the DDL file in ddl-file-path
in Global settings.
Want to run a custom query statement
Fill in the custom query statement in the benchmark-custom-queries
of telematics Benchmark, and you need to use "" to encapsulate the content. Random parameters are not supported.
Do not want to use the parameters recommended by the system, keep the existing parameters and run MatrixBench: When MatrixBench detects that there is inconsistency between the existing system and the recommended parameters, it will prompt in the standard output and ask whether it is necessary to reset the parameters and start the database. Enter "N" to retain the original parameters. MatrixBench will then confirm again whether MatrixBench will continue to run. Select "Y" and continue running.
What are the requirements for the legality of parameters? In Global Configuration:
ts-end
must be later than ts-start
;table-name
, schema-name
are not empty;tag-num
must be greater than 0
;ts-step-in-second
is not 0
.Notes!
For complete command line parameter information for MatrixBench, please refer to MatrixBench Command Line Parameters; for the main functions, please refer to MatrixBench Main Functions; for detailed explanation of MatrixBench progress information and statistical reports, please refer to MatrixBench Understanding Progress Information and Statistical Report.