This document describes some of the basic features of MatrixBench individually and in detail, including the following:
Notes!
Currently, mxbench is open source, and your review and contribution are welcome. Please click here read README.
Notes!
Due to the particularity of the timing scenario, mxbench stipulates that the first column of the data table must be a timestamp column, the second column must be a device identification column, and the indicator column that can be written to user data starting from the third column. Currently, multi-data types and feature functions only support metric columns, that is, columns starting from the third column.
int4, int8, float4, and float8, text and varchar.int4, int8, float4, and float8 are also possible to specify data characteristics such as value range.varchar type also provides special types plate_template and vin_template to better adapt to the needs of the Internet of Vehicles scenarios. The plate_template type is used to generate the varchar column of the license plate number type characteristics, such as: Guangdong BDM7709. vin_template is used to generate the varchar column for frame number type characteristics, such as: 1G1JC124627237595.timestamp or timestamptz.text, varchar, int8.There are two configuration methods for implementing a large number of indicators, multiple data types, multiple data value ranges, etc. in mxbench.
In the DDL file, after creating the table, express the purpose of the column by adding a comment: COMMENT ON COLUMN.
CREATE EXTENSION IF NOT EXISTS matrixts;
ALTER EXTENSION matrixts UPDATE;
CREATE SCHEMA IF NOT EXISTS "public";
CREATE TABLE "public"."table1" (
tsss timestamp
, vinnn bigint
, lltt varchar(32)
, c_arb float8
, exttt json
)
USING mars2 WITH ( compress_threshold='1000', chunk_size='32' )
DISTRIBUTED BY (vinnn)
PARTITION BY RANGE(tsss) (
START ('2022-04-18 09:00:00')
END ('2022-04-19 09:00:06')
EVERY ('10800 second'),
DEFAULT PARTITION default_prt
);
CREATE INDEX IF NOT EXISTS "idx_table1" ON "public"."table1"
USING mars2_btree(
vinnn
, tsss);
COMMENT ON COLUMN table1.lltt is '{"name": "plate_template"}';
COMMENT ON COLUMN table1.c_arb is '{"min": 3.5, "max": 5}';
COMMENT ON COLUMN table1.exttt is '{"is-ext": true, "columns-descriptions": [{"type": "float8", "count": 1, "comment": {"min": 3, "max": 4}},{"type": "float4", "count": 3, "comment": {"min": 2, "max": 3}}]}';
The example configuration is as above. COMMENT is all JSON format strings, description:
lltt column is of type varchar. In the example, we use COMMENT to mark: name=license_template, and data of license plate number type will be generated for lltt.c_arb column is of type float8. According to COMMENT, random data in the range 3.5 to 5 will be generated.table1.exttt column is of type JSON, and its COMMENT is: is-ext=true, meaning it is marked as an extended metric column, which may contain multiple simple metrics. There are two ways to view simple metrics:columns-descriptions is not specified, the total-metrics-count and metrics-type parameters in the Global category are read. The configuration of total-metrics-count must be greater than or equal to Simple metric columns + 3, that is, the extended column contains at least 3 indicators, otherwise an error message will be generated.columns-descriptions is specified and is a legal json array string, the total-metrics-count and metrics-type parameters in the Global category are invalid. You need to understand the number of indicators through the columns-descriptions parameter. Examples are as follows:[
{"type": "float8", "count": 1, "comment": {"min": 3, "max": 4}},
{"type": "float4", "count": 3, "comment": {"min": 2, "max": 3}},
]
Therefore, there are 1 metric of type float8 and 3 metrics of type float4 in this extension column. At the same time, there are also limits on the value range.
In this section, you can reuse the columns-description syntax of the above extension columns to make metrics-descriptions under the Global module accept a string-type parameter. Examples are as follows:
[global]
table-name = "table1"
metrics-descriptions = """[
{"type": "float8", "count": 1000, "comment": {"min": 3, "max": 4}},
{"type": "float4", "count": 3, "comment": {"min": 2, "max": 3}}
]"""
The description of this parameter means that in addition to the fixed column (first column and second column) that are not included in the indicator calculation, the table table1 also contains 1000 metrics of type float8, 3 metrics of type float4, and 1003 total indicators, with a total of 1005 columns, which exceeds the upper limit of the number of columns in the data table (1000 columns).
Therefore, according to YMatrix logic, there will be 6 metrics arranged in the extended column of type JSON. In this way, there are only 997 simple metrics and 2 fixed columns, and 1 extended column (column containing multiple metrics), which is exactly 1,000 columns in total, and it is back to the support scope of the data table.
Notes!
You can also use the--total-metrics-countand---metrics-typeparameters in theGlobalcategory without specifying the--metrics-descriptionsparameter as alternatives.
Directly inputting customized query statements cannot set certain values in the SELECT and WHERE statements based on the characteristics of the data generated this time, so the effect of query tests is limited. This problem can be solved by using combined query statements.
{
// The name of the query statement
"name": "QUERY_NAME",
// Statements after SELECT
"projections": {"use-raw-expression": true, "expression": "*"},
// Expression after SELECT FROM
"from": {"use-relation-identifier": true, "relation-identifier": "sub-relation-identifier"},
// Expressions in device identification column-related WHERE statements
"device-predicate": {"count": 2, "is-random": ture},
// Timestamp column-related expressions in WHERE statements
"ts-predicate": {"start": "2022-05-05 01:04:10", "end": "2022-05-05 01:04:10"},
// Expressions in user metrics related WHERE statements
"metrics-predicate": {"use-raw-expression": true, "expression": "m1>=37.5"},
// GROUP BY statement related expressions
"group-by": {"use-raw-expression": true, "expression": "device_column_name,ts"},
// ORDER BY statement related expressions
"order-by": {"user-raw-expression": true, "": "s desc"},
// LIMIT statement related expressions
"limit": 3
},
SELECT, which can be a query for all fields "*", or a projection query for some fields.JSON type.raw-expression:{"use-raw-expression": true, "expression": "*"}
FROM expression after.table-name configured in the Global category directly, which is equivalent to:{
"use-relation-identifier": true,
"relation-identifier": "table-name-in-global-config",
}
Use the name of the data table directly:
{
"use-relation-identifier": true,
"relation-identifier": "device_signal_mars2",
}
Use another data table to illustrate:
{
"relation-statement":
{
"projections": {"use-raw-expression": true, "expression": "device_column_name,max(m1) as mp, min(m2), count(m1), avg(m3)"},
"device-predicate": {"count": 17, "is-random": true},
"ts-predicate": {"start": "2022-05-05 00:00:00", "end": "2022-05-06 00:00:00"},
"group-by": {"use-raw-expression": true, "expression": "device_column_name"}
}
}
JSON.WHERE <device_column_name> IN (<random_device_id1>, <random_device_id2>).{
"count": 2,
"is-random": false
}
If n = 1, it will be interpreted as an equal value query, and the following configuration is generated: WHERE <device_column_name>=<random_device_id>.
{
"count": 1,
"is-random": false
}
The string after using raw-expression: "expression" appears directly after WHERE. and other predictates are "AND".
{
"use-raw-expression": true,
"expression": "device_column_name IN (1234, 4321)"
}
JSON.{
"is-random": true,
"duration": 3600
}
Set start and end time: Assuming the timestamp field is named ts, an expression of WHERE ts >= '2022-07-15 18:07:00' AND ts <= '2022-07-15 18:31:17' will be generated.
{
"start": "2022-07-15 18:07:00",
"end": "2022-07-15 18:31:17",
}
Equivalent query: WHERE ts = '2022-07-15 18:07:00'.
{
"start": "2022-07-15 18:07:00",
"end": "2022-07-15 18:07:00",
}
Open interval query: WHERE ts >= '2022-07-15 18:07:00' AND ts < '2022-07-15 18:31:17'.
{
"start": "2022-07-15 18:07:00",
"end": "2022-07-15 18:07:00",
"end-exclusive": true
}
Set alias for timestamp fields: WHERE tttt >= '2022-07-15 18:07:00' AND tttt < '2022-07-15 18:31:17'.
{
"has-alias": true,
"alias": "tttt",
"start": "2022-07-15 18:07:00",
"end": "2022-07-15 18:07:00",
"end-exclusive": true
}
The string after using raw-expression: "expression" appears directly after WHERE. and its predictates are "AND".
{
"use-raw-expression": true,
"expression": "ts='2022-07-16 10:31:17'"
}
JSON.raw-expression: "expression" appears directly after WHERE. and other predictates are "AND".{
"use-raw-expression": true,
"expression": "m1>=37.5"
}
GROUP BY statement, accepts a configuration of JSON type.GROUP BY statement.raw-expression:{
"use-raw-expression": true,
"expression": "device_column_name,ts"
}
ORDER BY statement, accepts a configuration of type JSON.ORDER BY statement.raw-expression:{
"user-raw-expression": true,
"expression": "s desc"
}
Notes!
For complete command line parameter information for MatrixBench, please refer to MatrixBench Command Line Parameters; for detailed explanation of MatrixBench progress information and statistical reports, please refer to MatrixBench Understanding Progress Information and Statistical Report.