MatrixBench Get started quickly

This document introduces the basic usage of MatrixBench, including the following:

  • Environmental preparation
  • Get started quickly
  • Example
  • Use related FAQ

Notes!
Currently, mxbench is open source, and your review and contribution are welcome. Please [click here] (https://github.com/ymatrix-data/mxbench/blob/master/README.md) read README.

1 Environmental preparation

1.1 YMatrix Cluster

A properly functioning YMatrix cluster is required.

1.2 Environment variables

Since MatrixBench needs to call createdb, gpconfig, and gpstop, you need to configure relevant environment variables in advance so that these commands can be executed correctly.

Specifically, you need to execute source <YMatrix installation directory>/greenplum_path.sh and correctly set the following environment variables:

  • PGHOST
  • PGPORT
  • PGUSER
  • PGPASSWORD
  • PGDATABASE
  • MASTER_DATA_DIRECTORY

In addition, users can also try to execute commands such as createdb mxbench, gpconfig -s log_rotation_size, gpstop -rai to ensure that they can run correctly.

1.3 MatrixGate

You need to write data through MatrixGate (mxgate for short). mxgate is a high-performance streaming data loading server located in bin/mxgate under the YMatrix installation directory. For more information, see mxgate.

2 Get started quickly

If you want to quickly try MatrixBench on your personal development machine, you can run MatrixBench using configuration files or command lines.

Note! mxbench supports the implementation of multiple data types and features, and combined query statements. The following example does not include the relevant usage of the above two functions. For more information, please see Basic Functions.

2.1. Configuration file running

You can use the following configuration file, named mxbench.conf and run mxbench --config mxbench.conf.

Note! The benchmark-parallel parameter setting needs to adapt to machine performance, and it is recommended that the number of CPU cores is less than or equal.


[database]
db-database = "testdb1"
db-master-port = 5432

[global]

Skip the step of asking whether to reset the parameters. If set to true, you think you have set the database parameters and mxbench will no longer process it

skip-set-gucs = true

table-name = "table1"

The directory where the generated DDL, parameter setting suggestions, query statements (Query) and other files are stored

workspace = "/tmp/mxbench"

table-name = "table1"

[benchmark] benchmark = "telematics"

[benchmark.telematics]

Array, query concurrency

benchmark-parallel = [8]
# 3 queries provided: latest value of bicycle, latest value of 10 vehicles, and detailed bicycles
benchmark-run-query-names = ["SINGLE_TAG_DETAIL_QUERY" ]
# The number of times or time of running each Query in each round, to make the time take effect, you need to set the number of times to 0, as follows:
benchmark-run-times = 0
benchmark-runtime-in-second = "30"
### 2.2 Command line run

You can also run MatrixBench using the command line. Running the following command is equivalent to running MatrixBench with the above configuration file.
```bash
mxbench run \
  --db-database "testdb1" \
  --db-master-host "localhost" \
  --db-master-port 5432 \
  --db-user "mxadmin" \
  --skip-set-gucs \
  --table-name "table1" \
  --benchmark "telematics" \
  --benchmark-run-query-names "SINGLE_TAG_DETAIL_QUERY" \
  --benchmark-parallel 8 \
  --benchmark-run-times 0 \
  --benchmark-runtime-in-second 30

3 Examples

This section gives examples of configuration files and command line running.

3.1 Sample Configuration File

In this section we provide example configuration files for two typical scenarios:

  • Ultra-wide sparse table generates data and runs hybrid loads
  • Read DDL file table building from external path, read CSV file writing data, and do not query

3.1.1 Ultra-wide sparse table generates data and runs hybrid loads

Ultra-wide sparse tables refer to tables with a large number of indicators (many columns), but the data in each row is mostly empty. They are common in scenarios where a large number of indicators are required to build different data analysis models. Mixed load means that data writing and query are carried out simultaneously, and data writing is completed through the mxgate tool.

  # Sample File 1
[Database]
  db-database = "testdb2"
  db-master-port = 5432

[Global]
  # Turn on progress viewing function, the default is true
  watch = true

  # The directory where generated DDL, parameter best practice suggestions, query statements and other files are stored
  workspace = "/home/mxadmin/mxbench/workspace"

  # Whether data writing and query are carried out simultaneously
  simultaneous-loading-and-query = true

  table-name = "table2"

  # Number of equipment
  tag-num = 20000
  # Metric data type, supporting four types: int4, int8, float4, float8
  metrics-type = "float8"
  # Number of indicators. If the number of indicators is greater than 998, the first 997 are used as simple columns.
  # Others are stored as JSON in a column named ext
  total-metrics-count = 5000

  # The start timestamp of the data generated, ts-end must be later than ts-start, otherwise an error will be reported.
  ts-start = "2022-04-19 00:00:00"
  ts-end = "2022-04-19 00:01:00"

[Generator]
  generator = "telematics"

  [generator.telematics]
    # The indicators of each device at each time point are uploaded in several pieces of data, and finally the database is UPSERT into 1 Tuple
    generator-batch-size = 1
    # The generation ratio of delayed uploaded data (1~100), the timestamp is pushed forward for 1 hour
    generator-disorder-ratio = 0
    # The null rate of generated data (1~100)
    generator-empty-value-ratio = 90
    # The randomness of the generated data is several levels, with OFF / S / M / L, and the default "OFF" is turned off
    generator-randomness = "OFF"

[Writer]
  writer = "stdin"

[Benchmark]
  benchmark = "telematics"

  [benchmark.telematics]
    # Array, query concurrency
    benchmark-parallel = [64]
    # 3 queries provided: latest value of bicycle, latest value of 10 vehicles, and detailed bicycles
    benchmark-run-query-names = [ "SINGLE_TAG_LATEST_QUERY", "MULTI_TAG_LATEST_QUERY", "SINGLE_TAG_DETAIL_QUERY" ]
    # The number of times or time of running each query statement in each round, so that the time needs to be effective, set the number of times to 0, as follows:
    benchmark-run-times = 0
    benchmark-runtime-in-second = "60"

3.1.2 Read DDL file table building from external path, read CSV file writing data, and do not query

That is, read the DDL file from the user path and complete the table creation, and then read the CSV file and write the data. This example scenario increases the autonomy of mxbench testing. If you already have a certain DDL and have the test data ready, please follow this example to test.
This example does not perform query operations, so there is no mixed loading situation.

  # Sample File 2
[Database]
  db-database = "testdb3"
  db-master-port = 5432

[Generator]
  # Read data from a CSV file
  generator = "file"

  [generator.file]
    generator-file-paths = ["/home/mxadmin/mxbench/data.csv"]


[Global]

  table-name = "table3"

  watch = true
  workspace = "/home/mxadmin/mxbench/workspace"
  ddl-file-path = "/home/mxadmin/mxbench/ddl.sql"

[Writer]
  writer = "stdin"

[Benchmark]
  benchmark = "nil"

3.2 Sample command line

In this section we provide example run commands for two typical scenarios:

  • Ultra-wide sparse table generates data and runs hybrid loads
  • Read DDL file table building from external path, read CSV file writing data, and do not query

3.2.1 Ultra-wide sparse table generates data and runs hybrid loads

Ultra-wide sparse tables refer to tables with a large number of indicators (many columns), but the data in each row is mostly empty. They are common in scenarios where a large number of indicators are required to build different data analysis models. Mixed load means that data writing and query are carried out simultaneously, and data writing is completed through the mxgate tool.

Using the sample configuration file 1 Running MatrixBench is equivalent to running MatrixBench using the following command line:

$ mxbench run \
  --db-database "testdb2" \
  --db-master-port 5432 \
  --db-user "mxadmin" \
  --workspace "/home/mxadmin/mxbench/workspace" \
  --simultaneous-loading-and-query \
  --table-name "table2" \
  --tag-num 25000 \
  --metrics-type "float8" \
  --total-metrics-count 5000 \
  --ts-start "2022-04-19 00:00:00" \
  --ts-end "2022-04-19 00:01:00" \
  --generator "telematics" \
  --generator-batch-size 1 \
  --generator-disorder-ratio 0 \
  --generator-empty-value-ratio 90 \
  --generator-randomness "OFF" \
  --writer "stdin" \
  --benchmark "telematics" \
  --benchmark-run-query-names "SINGLE_TAG_LATEST_QUERY" \
  --benchmark-run-query-names "MULTI_TAG_LATEST_QUERY" \
  --benchmark-run-query-names "SINGLE_TAG_DETAIL_QUERY" \
  --benchmark-parallel 64 \
  --benchmark-run-times 0 \
  --benchmark-runtime-in-second 60

3.2.2 Read DDL file table building from external path, read CSV file writing data, and do not query

That is, read the DDL file from the user path and complete the table creation, and then read the CSV file and write the data. This example scenario increases the autonomy of mxbench testing. If you already have a certain DDL and have the test data ready, please follow this example to test.
This example does not perform query operations, so there is no mixed loading situation.

Using the sample configuration file 2 Running MatrixBench is equivalent to running MatrixBench using the following command line:

$ mxbench run \
  --db-database "testdb3" \
  --db-master-port 5432 \
  --workspace "/home/mxadmin/mxbench/workspace" \
  --ddl-file-path "/home/mxadmin/mxbench/ddl.sql" \
  --table-name "table3" \
  --generator "file" \
  --generator-file-paths "/home/mxadmin/mxbench/data.csv" \
  --writer "stdin" \
  --benchmark "nil"

Use related FAQ

  1. Only write, no query Set Benchmark to nil;

  2. Only query, not write Set Generator to nil;

  3. Write and query are performed simultaneously simultaneous-loading-and-query is true in Global settings.

  4. Want to generate and dump the CSV data file In Global settings, dump is true, and the generated files are in the unix-timestamp directory in the directory set by workspace.

  5. Want to view the generated DDL and query statements The unix-timestamp directory is in the directory set by the workspace.

  6. Want to run custom DDL Fill in the absolute path to the DDL file in ddl-file-path in Global settings.

  7. Want to run a custom query statement Fill in the custom query statement in the benchmark-custom-queries of telematics Benchmark, and you need to use "" to encapsulate the content. Random parameters are not supported.

  8. Do not want to use the parameters recommended by the system, keep the existing parameters and run MatrixBench: When MatrixBench detects that there is inconsistency between the existing system and the recommended parameters, it will prompt in the standard output and ask whether it is necessary to reset the parameters and start the database. Enter "N" to retain the original parameters. MatrixBench will then confirm again whether MatrixBench will continue to run. Select "Y" and continue running.

  9. What are the requirements for the legality of parameters? In Global Configuration:

  • ts-end must be later than ts-start;
  • table-name, schema-name are not empty;
  • tag-num must be greater than 0;
  • ts-step-in-second is not 0.

Notes!
For complete command line parameter information for MatrixBench, please refer to MatrixBench Command Line Parameters; for the main functions, please refer to MatrixBench Main Functions; for detailed explanation of MatrixBench progress information and statistical reports, please refer to MatrixBench Understanding Progress Information and Statistical Report.