MatrixBench Basic Features

This document separately and in detail introduces some basic features of MatrixBench, including the following topics:

  • Multiple data types and characteristics
  • Composite query statements

Note!
mxbench is now open source. Your review and contributions are welcome. Please click here to read the README.

1 Multiple Data Types and Characteristics

1.1 Supported Types

Note!
Due to the specific nature of time-series scenarios, mxbench requires that the first column of a data table must be a timestamp column, and the second column must be a device identifier column. User data metrics can only be written starting from the third column. Currently, the multiple data types and characteristics feature only applies to metric columns, i.e., columns starting from the third one.

  • mxbench now supports collecting various types of metrics in signal tables. In addition to int4, int8, float4, and float8, it also supports types such as text and varchar.
  • The four types int4, int8, float4, and float8 can also specify data characteristics such as value ranges.
  • The varchar type provides special subtypes plate_template and vin_template to better meet the requirements of vehicle networking scenarios. The plate_template type is used to generate varchar columns with license plate number characteristics, for example: 粤BDM7709. The vin_template type is used to generate varchar columns with vehicle identification number (VIN) characteristics, for example: 1G1JC124627237595.

1.2 Restrictions

  • Column 1 must be a timestamp column. The column name is arbitrary, but the type must be timestamp or timestamptz.
  • Column 2 must be a device ID or device name column. The column name is arbitrary, and the type must be one of text, varchar, or int8.

1.3 Usage

There are two ways to configure multiple metrics, various data types, and multiple value ranges in mxbench.

1.3.1 Using a DDL File

After creating a table in the DDL file, use column comments: COMMENT ON COLUMN to indicate the purpose of the column.

CREATE EXTENSION IF NOT EXISTS matrixts;
ALTER EXTENSION matrixts UPDATE;
CREATE SCHEMA IF NOT EXISTS "public";
CREATE TABLE "public"."table1" (
    tsss timestamp
,   vinnn bigint
,   lltt varchar(32)
,   c_arb float8
,   exttt json
)
USING mars2 WITH ( compress_threshold='1000', chunk_size='32' )
DISTRIBUTED BY (vinnn)
PARTITION BY RANGE(tsss) (
        START ('2022-04-18 09:00:00')
        END ('2022-04-19 09:00:06')
        EVERY ('10800 second'),
        DEFAULT PARTITION default_prt
);

CREATE INDEX IF NOT EXISTS "idx_table1" ON "public"."table1"
USING mars2_btree(
        vinnn
      , tsss);

COMMENT ON COLUMN table1.lltt is '{"name": "plate_template"}';
COMMENT ON COLUMN table1.c_arb is '{"min": 3.5, "max": 5}';
COMMENT ON COLUMN table1.exttt is '{"is-ext": true, "columns-descriptions": [{"type": "float8", "count": 1, "comment": {"min": 3, "max": 4}},{"type": "float4", "count": 3, "comment": {"min": 2, "max": 3}}]}';

The configuration example above shows that COMMENT are all in JSON format strings. Notes:

  1. The first two columns have content restrictions; see section "1.2 Restrictions" above.
  2. Except for the first two columns, all others are treated as metric columns.
  3. The lltt column is of type varchar. In the example, we mark it with COMMENT: name=license_template, which will generate license plate number-like data for lltt.
  4. The c_arb column is of type float8. According to COMMENT, it will generate random data in the range 3.5 to 5.
  5. The table1.exttt column is of type JSON, and its COMMENT is: is-ext=true, meaning it is marked as an extended metric column that may contain multiple simple metrics. There are two ways to view the simple metrics:
    • If columns-descriptions is not specified, read the Global and total-metrics-count parameters from the metrics-type category. The total-metrics-count configuration must be greater than or equal to 简单指标列总数 + 3; that is, the extended column must contain at least 3 metrics; otherwise, an error will occur.
    • If columns-descriptions is specified and is a valid json array string, the Global and total-metrics-count parameters in the metrics-type category become invalid. You must use the columns-descriptions parameter to determine the number of metrics. Example:
      [
      {"type": "float8", "count": 1, "comment": {"min":  3, "max": 4}},  
      {"type": "float4", "count": 3, "comment": {"min":  2, "max": 3}}, 
      ]

      Thus, this extended column contains 1 metric of type float8 and 3 metrics of type float4, each with value range constraints.

1.3.2 Using a Configuration File

In this section, you can reuse the extended column columns-description syntax so that the metrics-descriptions parameter under the Global module accepts a string-type argument. Example:

[global]
  table-name = "table1"
  metrics-descriptions =  """[
    {"type": "float8", "count": 1000, "comment": {"min": 3, "max": 4}},
    {"type": "float4", "count": 3, "comment": {"min": 2, "max": 3}}
    ]"""

This parameter indicates that, excluding the fixed columns (first and second columns), table table1 contains 1000 metrics of type float8 and 3 metrics of type float4, totaling 1003 metrics and 1005 columns, exceeding the maximum column limit of 1000 for a database table.
Therefore, according to YMatrix logic, 6 metrics will be placed into an extended column of type JSON. This results in 997 simple metrics, 2 fixed columns, and 1 extended column (a column containing multiple metrics), totaling exactly 1000 columns, which falls back within the supported range of the database table.

Note!
You may also omit the --metrics-descriptions parameter and instead use the Global category's --total-metrics-count and --metrics-type parameters.

2 Composite Query Statements

Directly inputting custom query statements cannot dynamically set values in SELECT or WHERE clauses based on generated data characteristics, limiting the effectiveness of query testing. Composite query statements solve this issue.

2.1 Configuration Overview

    {
    // Name of this query
    "name": "QUERY_NAME",
    // Expression after SELECT
    "projections": {"use-raw-expression": true, "expression": "*"},
    // Expression after SELECT FROM
    "from": {"use-relation-identifier": true, "relation-identifier": "sub-relation-identifier"},
    // Expression for device identifier column in WHERE clause
    "device-predicate": {"count": 2, "is-random": true},
    // Expression for timestamp column in WHERE clause
    "ts-predicate": {"start": "2022-05-05 01:04:10", "end": "2022-05-05 01:04:10"},
    // Expression for metric-related WHERE clause
    "metrics-predicate": {"use-raw-expression": true, "expression": "m1>=37.5"},
    // Expression for GROUP BY clause
    "group-by": {"use-raw-expression": true, "expression": "device_column_name,ts"},
    // Expression for ORDER BY clause
     "order-by": {"use-raw-expression": true, "expression": "s desc"},
     // Expression for LIMIT clause
     "limit": 3
    },

2.2 Detailed Explanation

  1. "name"
    Name of the query. It will be printed in the query summary report.
  2. "projections"
    Mandatory field. Specifies the expression after SELECT. It can be a query of all fields (“*”) or a projection of selected fields.
    Accepts a JSON type configuration.
    Usage example:
    {"use-raw-expression": true, "expression": "*"}
  3. "from"
    Expression after FROM.
    Default behavior uses the table-name configured in the Global category, equivalent to:
    {
    "use-relation-identifier": true, 
    "relation-identifier": "table-name-in-global-config",
    }

    Using the actual table name:

    {
    "use-relation-identifier": true, 
    "relation-identifier": "device_signal_mars2",
    }

    Using a subquery:

    { 
    "relation-statement":
     {
     "projections": {"use-raw-expression": true, "expression": "device_column_name,max(m1) as mp, min(m2), count(m1), avg(m3)"},
     "device-predicate": {"count": 17, "is-random": true},
     "ts-predicate": {"start": "2022-05-05 00:00:00", "end": "2022-05-06 00:00:00"},
     "group-by": {"use-raw-expression": true, "expression": "device_column_name"}
     }
    }
  4. "device-predicate"
    Expression for device ID in the WHERE clause. Accepts a JSON type configuration.
    Default: no device-related WHERE clause (i.e., all devices).
    Randomly select n (n is a positive integer within the device count range) devices: for example, n = 2, the following configuration generates: WHERE <device_column_name> IN (<random_device_id1>, <random_device_id2>).
    {
    "count": 2, 
    "is-random": false
    }

    If n = 1, it becomes an equality query, generating: WHERE <device_column_name>=<random_device_id>.

    {
    "count": 1, 
    "is-random": false
    }

    Using raw-expression: The string after "expression" appears directly after WHERE. Combined with other predicates using "AND".

    {
    "use-raw-expression": true, 
    "expression": "device_column_name IN (1234, 4321)"
    }
  5. "ts-predicate"
    Expression for timestamp in the WHERE clause. Accepts a JSON type configuration.
    Default: no timestamp-related WHERE clause (i.e., full time range). Randomly select one timestamp within the generated data range:
    {
    "is-random": true, 
    "duration": 3600  
    }

    Set start and end times: assuming the timestamp column is named ts, it generates the expression WHERE ts >= '2022-07-15 18:07:00' AND ts <= '2022-07-15 18:31:17'.

    {
    "start": "2022-07-15 18:07:00", 
    "end": "2022-07-15 18:31:17",
    }

    Equality query: WHERE ts = '2022-07-15 18:07:00'.

    {
    "start": "2022-07-15 18:07:00", 
    "end": "2022-07-15 18:07:00",
    }

    Open-ended interval query: WHERE ts >= '2022-07-15 18:07:00' AND ts < '2022-07-15 18:31:17'.

    {
    "start": "2022-07-15 18:07:00", 
    "end": "2022-07-15 18:07:00",
    "end-exclusive": true
    }

    Set an alias for the timestamp column: WHERE tttt >= '2022-07-15 18:07:00' AND tttt < '2022-07-15 18:31:17'.

    {
    "has-alias": true,
    "alias": "tttt",
    "start": "2022-07-15 18:07:00", 
    "end": "2022-07-15 18:07:00",
    "end-exclusive": true
    }

    Using raw-expression: The string after "expression" appears directly after WHERE. Combined with other predicates using "AND".

    {
    "use-raw-expression": true, 
    "expression": "ts='2022-07-16 10:31:17'"
    }
  6. "metrics-predicate"
    Expression for metric filtering in the WHERE clause. Accepts a JSON type configuration.
    Default: no metric filtering. Using raw-expression: The string after "expression" appears directly after WHERE. Combined with other predicates using "AND".
    {
    "use-raw-expression": true, 
    "expression": "m1>=37.5"
    }
  7. "group-by"
    GROUP BY clause. Accepts a JSON type configuration.
    Default: no GROUP BY clause.
    Using raw-expression:
    {
    "use-raw-expression": true, 
    "expression": "device_column_name,ts"
    }
  8. "order-by"
    ORDER BY clause. Accepts a JSON type configuration.
    Default: no ORDER BY clause.
    Using raw-expression:
    {
    "use-raw-expression": true, 
    "expression": "s desc"
    }
  9. "limit"
    Accepts a positive integer.

Note!
For complete command-line parameter information for MatrixBench, see MatrixBench Command-Line Parameters. For details on progress information and statistical reports, see MatrixBench Understanding Progress Information and Statistical Reports.