This document describes some of the basic features of MatrixBench individually and in detail, including the following:
Notes!
Currently, mxbench is open source, and your review and contribution are welcome. Please [click here] (https://github.com/ymatrix-data/mxbench/blob/master/README.md) read README.
Notes!
Due to the particularity of the timing scenario, mxbench stipulates that the first column of the data table must be a timestamp column, the second column must be a device identification column, and the indicator column that can be written to user data starting from the third column. Currently, multi-data types and feature functions only support metric columns, that is, columns starting from the third column.
int4
, int8
, float4
, and float8
, text
and varchar
.int4
, int8
, float4
, and float8
are also possible to specify data characteristics such as value range.varchar
type also provides special types plate_template
and vin_template
to better adapt to the needs of the Internet of Vehicles scenarios. The plate_template
type is used to generate the varchar
column of the license plate number type characteristics, such as: Guangdong BDM7709
. vin_template
is used to generate the varchar
column for frame number type characteristics, such as: 1G1JC124627237595
.timestamp
or timestamptz
.text
, varchar
, int8
.There are two configuration methods for implementing a large number of indicators, multiple data types, multiple data value ranges, etc. in mxbench.
In the DDL file, after creating the table, express the purpose of the column by adding a comment: COMMENT ON COLUMN
.
CREATE EXTENSION IF NOT EXISTS matrixts;
ALTER EXTENSION matrixts UPDATE;
CREATE SCHEMA IF NOT EXISTS "public";
CREATE TABLE "public"."table1" (
tsss timestamp
, vinnn bigint
, lltt varchar(32)
, c_arb float8
, exttt json
)
USING mars2 WITH ( compress_threshold='1000', chunk_size='32' )
DISTRIBUTED BY (vinnn)
PARTITION BY RANGE(tsss) (
START ('2022-04-18 09:00:00')
END ('2022-04-19 09:00:06')
EVERY ('10800 second'),
DEFAULT PARTITION default_prt
);
CREATE INDEX IF NOT EXISTS "idx_table1" ON "public"."table1"
USING mars2_btree(
vinnn
, tsss);
COMMENT ON COLUMN table1.lltt is '{"name": "plate_template"}';
COMMENT ON COLUMN table1.c_arb is '{"min": 3.5, "max": 5}';
COMMENT ON COLUMN table1.exttt is '{"is-ext": true, "columns-descriptions": [{"type": "float8", "count": 1, "comment": {"min": 3, "max": 4}},{"type": "float4", "count": 3, "comment": {"min": 2, "max": 3}}]}';
The example configuration is as above. COMMENT
is all JSON
format strings, description:
lltt
column is of type varchar
. In the example, we use COMMENT
to mark: name=license_template
, and data of license plate number type will be generated for lltt
.c_arb
column is of type float8
. According to COMMENT
, random data in the range 3.5 to 5 will be generated.table1.exttt
column is of type JSON
, and its COMMENT
is: is-ext=true
, meaning it is marked as an extended metric column, which may contain multiple simple metrics. There are two ways to view simple metrics:columns-descriptions
is not specified, the total-metrics-count
and metrics-type
parameters in the Global
category are read. The configuration of total-metrics-count
must be greater than or equal to Simple metric columns + 3
, that is, the extended column contains at least 3
indicators, otherwise an error message will be generated.columns-descriptions
is specified and is a legal json array
string, the total-metrics-count
and metrics-type
parameters in the Global
category are invalid. You need to understand the number of indicators through the columns-descriptions
parameter. Examples are as follows:[
{"type": "float8", "count": 1, "comment": {"min": 3, "max": 4}},
{"type": "float4", "count": 3, "comment": {"min": 2, "max": 3}},
]
Therefore, there are 1 metric of type float8
and 3 metrics of type float4
in this extension column. At the same time, there are also limits on the value range.
In this section, you can reuse the columns-description
syntax of the above extension columns to make metrics-descriptions
under the Global module accept a string-type parameter. Examples are as follows:
[global]
table-name = "table1"
metrics-descriptions = """[
{"type": "float8", "count": 1000, "comment": {"min": 3, "max": 4}},
{"type": "float4", "count": 3, "comment": {"min": 2, "max": 3}}
]"""
The description of this parameter means that in addition to the fixed column (first column and second column) that are not included in the indicator calculation, the table table1
also contains 1000 metrics of type float8
, 3 metrics of type float4
, and 1003 total indicators, with a total of 1005 columns, which exceeds the upper limit of the number of columns in the data table (1000 columns).
Therefore, according to YMatrix logic, there will be 6 metrics arranged in the extended column of type JSON
. In this way, there are only 997 simple metrics and 2 fixed columns, and 1 extended column (column containing multiple metrics), which is exactly 1,000 columns in total, and it is back to the support scope of the data table.
Notes!
You can also use the--total-metrics-count
and---metrics-type
parameters in theGlobal
category without specifying the--metrics-descriptions
parameter as alternatives.
Directly inputting customized query statements cannot set certain values in the SELECT and WHERE statements based on the characteristics of the data generated this time, so the effect of query tests is limited. This problem can be solved by using combined query statements.
{
// The name of the query statement
"name": "QUERY_NAME",
// Statements after SELECT
"projections": {"use-raw-expression": true, "expression": "*"},
// Expression after SELECT FROM
"from": {"use-relation-identifier": true, "relation-identifier": "sub-relation-identifier"},
// Expressions in device identification column-related WHERE statements
"device-predicate": {"count": 2, "is-random": ture},
// Timestamp column-related expressions in WHERE statements
"ts-predicate": {"start": "2022-05-05 01:04:10", "end": "2022-05-05 01:04:10"},
// Expressions in user metrics related WHERE statements
"metrics-predicate": {"use-raw-expression": true, "expression": "m1>=37.5"},
// GROUP BY statement related expressions
"group-by": {"use-raw-expression": true, "expression": "device_column_name,ts"},
// ORDER BY statement related expressions
"order-by": {"user-raw-expression": true, "": "s desc"},
// LIMIT statement related expressions
"limit": 3
},
SELECT
, which can be a query for all fields "*"
, or a projection query for some fields.JSON
type.raw-expression
:{"use-raw-expression": true, "expression": "*"}
FROM
expression after.table-name
configured in the Global
category directly, which is equivalent to:{
"use-relation-identifier": true,
"relation-identifier": "table-name-in-global-config",
}
Use the name of the data table directly:
{
"use-relation-identifier": true,
"relation-identifier": "device_signal_mars2",
}
Use another data table to illustrate:
{
"relation-statement":
{
"projections": {"use-raw-expression": true, "expression": "device_column_name,max(m1) as mp, min(m2), count(m1), avg(m3)"},
"device-predicate": {"count": 17, "is-random": true},
"ts-predicate": {"start": "2022-05-05 00:00:00", "end": "2022-05-06 00:00:00"},
"group-by": {"use-raw-expression": true, "expression": "device_column_name"}
}
}
JSON
.WHERE <device_column_name> IN (<random_device_id1>, <random_device_id2>)
.{
"count": 2,
"is-random": false
}
If n = 1, it will be interpreted as an equal value query, and the following configuration is generated: WHERE <device_column_name>=<random_device_id>
.
{
"count": 1,
"is-random": false
}
The string after using raw-expression
: "expression" appears directly after WHERE
. and other predictates are "AND".
{
"use-raw-expression": true,
"expression": "device_column_name IN (1234, 4321)"
}
JSON
.{
"is-random": true,
"duration": 3600
}
Set start and end time: Assuming the timestamp field is named ts
, an expression of WHERE ts >= '2022-07-15 18:07:00' AND ts <= '2022-07-15 18:31:17'
will be generated.
{
"start": "2022-07-15 18:07:00",
"end": "2022-07-15 18:31:17",
}
Equivalent query: WHERE ts = '2022-07-15 18:07:00'
.
{
"start": "2022-07-15 18:07:00",
"end": "2022-07-15 18:07:00",
}
Open interval query: WHERE ts >= '2022-07-15 18:07:00' AND ts < '2022-07-15 18:31:17'
.
{
"start": "2022-07-15 18:07:00",
"end": "2022-07-15 18:07:00",
"end-exclusive": true
}
Set alias for timestamp fields: WHERE tttt >= '2022-07-15 18:07:00' AND tttt < '2022-07-15 18:31:17'
.
{
"has-alias": true,
"alias": "tttt",
"start": "2022-07-15 18:07:00",
"end": "2022-07-15 18:07:00",
"end-exclusive": true
}
The string after using raw-expression
: "expression" appears directly after WHERE
. and its predictates are "AND".
{
"use-raw-expression": true,
"expression": "ts='2022-07-16 10:31:17'"
}
JSON
.raw-expression
: "expression" appears directly after WHERE
. and other predictates are "AND".{
"use-raw-expression": true,
"expression": "m1>=37.5"
}
GROUP BY
statement, accepts a configuration of JSON
type.GROUP BY
statement.raw-expression
:{
"use-raw-expression": true,
"expression": "device_column_name,ts"
}
ORDER BY
statement, accepts a configuration of type JSON
.ORDER BY
statement.raw-expression
:{
"user-raw-expression": true,
"expression": "s desc"
}
Notes!
For complete command line parameter information for MatrixBench, please refer to MatrixBench Command Line Parameters; for detailed explanation of MatrixBench progress information and statistical reports, please refer to MatrixBench Understanding Progress Information and Statistical Report.