This document separately and in detail introduces some basic features of MatrixBench, including the following topics:
Note!
mxbench is now open source. Your review and contributions are welcome. Please click here to read the README.
Note!
Due to the specific nature of time-series scenarios, mxbench requires that the first column of a data table must be a timestamp column, and the second column must be a device identifier column. User data metrics can only be written starting from the third column. Currently, the multiple data types and characteristics feature only applies to metric columns, i.e., columns starting from the third one.
int4, int8, float4, and float8, it also supports types such as text and varchar.int4, int8, float4, and float8 can also specify data characteristics such as value ranges.varchar type provides special subtypes plate_template and vin_template to better meet the requirements of vehicle networking scenarios. The plate_template type is used to generate varchar columns with license plate number characteristics, for example: 粤BDM7709. The vin_template type is used to generate varchar columns with vehicle identification number (VIN) characteristics, for example: 1G1JC124627237595.timestamp or timestamptz.text, varchar, or int8.There are two ways to configure multiple metrics, various data types, and multiple value ranges in mxbench.
After creating a table in the DDL file, use column comments: COMMENT ON COLUMN to indicate the purpose of the column.
CREATE EXTENSION IF NOT EXISTS matrixts;
ALTER EXTENSION matrixts UPDATE;
CREATE SCHEMA IF NOT EXISTS "public";
CREATE TABLE "public"."table1" (
tsss timestamp
, vinnn bigint
, lltt varchar(32)
, c_arb float8
, exttt json
)
USING mars2 WITH ( compress_threshold='1000', chunk_size='32' )
DISTRIBUTED BY (vinnn)
PARTITION BY RANGE(tsss) (
START ('2022-04-18 09:00:00')
END ('2022-04-19 09:00:06')
EVERY ('10800 second'),
DEFAULT PARTITION default_prt
);
CREATE INDEX IF NOT EXISTS "idx_table1" ON "public"."table1"
USING mars2_btree(
vinnn
, tsss);
COMMENT ON COLUMN table1.lltt is '{"name": "plate_template"}';
COMMENT ON COLUMN table1.c_arb is '{"min": 3.5, "max": 5}';
COMMENT ON COLUMN table1.exttt is '{"is-ext": true, "columns-descriptions": [{"type": "float8", "count": 1, "comment": {"min": 3, "max": 4}},{"type": "float4", "count": 3, "comment": {"min": 2, "max": 3}}]}';
The configuration example above shows that COMMENT are all in JSON format strings. Notes:
lltt column is of type varchar. In the example, we mark it with COMMENT: name=license_template, which will generate license plate number-like data for lltt.c_arb column is of type float8. According to COMMENT, it will generate random data in the range 3.5 to 5.table1.exttt column is of type JSON, and its COMMENT is: is-ext=true, meaning it is marked as an extended metric column that may contain multiple simple metrics. There are two ways to view the simple metrics:columns-descriptions is not specified, read the Global and total-metrics-count parameters from the metrics-type category. The total-metrics-count configuration must be greater than or equal to 简单指标列总数 + 3; that is, the extended column must contain at least 3 metrics; otherwise, an error will occur.columns-descriptions is specified and is a valid json array string, the Global and total-metrics-count parameters in the metrics-type category become invalid. You must use the columns-descriptions parameter to determine the number of metrics. Example:[
{"type": "float8", "count": 1, "comment": {"min": 3, "max": 4}},
{"type": "float4", "count": 3, "comment": {"min": 2, "max": 3}},
]
Thus, this extended column contains 1 metric of type float8 and 3 metrics of type float4, each with value range constraints.
In this section, you can reuse the extended column columns-description syntax so that the metrics-descriptions parameter under the Global module accepts a string-type argument. Example:
[global]
table-name = "table1"
metrics-descriptions = """[
{"type": "float8", "count": 1000, "comment": {"min": 3, "max": 4}},
{"type": "float4", "count": 3, "comment": {"min": 2, "max": 3}}
]"""
This parameter indicates that, excluding the fixed columns (first and second columns), table table1 contains 1000 metrics of type float8 and 3 metrics of type float4, totaling 1003 metrics and 1005 columns, exceeding the maximum column limit of 1000 for a database table.
Therefore, according to YMatrix logic, 6 metrics will be placed into an extended column of type JSON. This results in 997 simple metrics, 2 fixed columns, and 1 extended column (a column containing multiple metrics), totaling exactly 1000 columns, which falls back within the supported range of the database table.
Note!
You may also omit the--metrics-descriptionsparameter and instead use theGlobalcategory's--total-metrics-countand--metrics-typeparameters.
Directly inputting custom query statements cannot dynamically set values in SELECT or WHERE clauses based on generated data characteristics, limiting the effectiveness of query testing. Composite query statements solve this issue.
{
// Name of this query
"name": "QUERY_NAME",
// Expression after SELECT
"projections": {"use-raw-expression": true, "expression": "*"},
// Expression after SELECT FROM
"from": {"use-relation-identifier": true, "relation-identifier": "sub-relation-identifier"},
// Expression for device identifier column in WHERE clause
"device-predicate": {"count": 2, "is-random": true},
// Expression for timestamp column in WHERE clause
"ts-predicate": {"start": "2022-05-05 01:04:10", "end": "2022-05-05 01:04:10"},
// Expression for metric-related WHERE clause
"metrics-predicate": {"use-raw-expression": true, "expression": "m1>=37.5"},
// Expression for GROUP BY clause
"group-by": {"use-raw-expression": true, "expression": "device_column_name,ts"},
// Expression for ORDER BY clause
"order-by": {"use-raw-expression": true, "expression": "s desc"},
// Expression for LIMIT clause
"limit": 3
},
SELECT. It can be a query of all fields (“*”) or a projection of selected fields.JSON type configuration.{"use-raw-expression": true, "expression": "*"}
FROM.table-name configured in the Global category, equivalent to:{
"use-relation-identifier": true,
"relation-identifier": "table-name-in-global-config",
}
Using the actual table name:
{
"use-relation-identifier": true,
"relation-identifier": "device_signal_mars2",
}
Using a subquery:
{
"relation-statement":
{
"projections": {"use-raw-expression": true, "expression": "device_column_name,max(m1) as mp, min(m2), count(m1), avg(m3)"},
"device-predicate": {"count": 17, "is-random": true},
"ts-predicate": {"start": "2022-05-05 00:00:00", "end": "2022-05-06 00:00:00"},
"group-by": {"use-raw-expression": true, "expression": "device_column_name"}
}
}
JSON type configuration.WHERE <device_column_name> IN (<random_device_id1>, <random_device_id2>).{
"count": 2,
"is-random": false
}
If n = 1, it becomes an equality query, generating: WHERE <device_column_name>=<random_device_id>.
{
"count": 1,
"is-random": false
}
Using raw-expression: The string after "expression" appears directly after WHERE. Combined with other predicates using "AND".
{
"use-raw-expression": true,
"expression": "device_column_name IN (1234, 4321)"
}
JSON type configuration.{
"is-random": true,
"duration": 3600
}
Set start and end times: assuming the timestamp column is named ts, it generates the expression WHERE ts >= '2022-07-15 18:07:00' AND ts <= '2022-07-15 18:31:17'.
{
"start": "2022-07-15 18:07:00",
"end": "2022-07-15 18:31:17",
}
Equality query: WHERE ts = '2022-07-15 18:07:00'.
{
"start": "2022-07-15 18:07:00",
"end": "2022-07-15 18:07:00",
}
Open-ended interval query: WHERE ts >= '2022-07-15 18:07:00' AND ts < '2022-07-15 18:31:17'.
{
"start": "2022-07-15 18:07:00",
"end": "2022-07-15 18:07:00",
"end-exclusive": true
}
Set an alias for the timestamp column: WHERE tttt >= '2022-07-15 18:07:00' AND tttt < '2022-07-15 18:31:17'.
{
"has-alias": true,
"alias": "tttt",
"start": "2022-07-15 18:07:00",
"end": "2022-07-15 18:07:00",
"end-exclusive": true
}
Using raw-expression: The string after "expression" appears directly after WHERE. Combined with other predicates using "AND".
{
"use-raw-expression": true,
"expression": "ts='2022-07-16 10:31:17'"
}
JSON type configuration.raw-expression: The string after "expression" appears directly after WHERE. Combined with other predicates using "AND".{
"use-raw-expression": true,
"expression": "m1>=37.5"
}
GROUP BY clause. Accepts a JSON type configuration.GROUP BY clause.raw-expression:{
"use-raw-expression": true,
"expression": "device_column_name,ts"
}
ORDER BY clause. Accepts a JSON type configuration.ORDER BY clause.raw-expression:{
"use-raw-expression": true,
"expression": "s desc"
}
Note!
For complete command-line parameter information for MatrixBench, see MatrixBench Command-Line Parameters. For details on progress information and statistical reports, see MatrixBench Understanding Progress Information and Statistical Reports.