MatrixGate Main Features

This document describes the main features of MatrixGate.

Notes!
Programming Language Access MatrixGate Please see Data Write - Programming Language Access MatrixGate.


1 MatrixGate Loading Special Type

1.1 Example of MatrixGate loading CSV files

  • Create table csvtable in the demo database.
    demo=# CREATE TABLE csvtable (
         time TIMESTAMP WITH TIME ZONE,
         tagid INT,
         c1 INT,
         c2 INT,
         c3 INT
         )
         USING MARS3
         DISTRIBUTED BY (tagid)
         ORDER BY (time,tagid);
  • Edit the data load file data.csv, the content is as follows:
    1603777821|1|101|201|301
    1603777822|2|102|202|302
    1603777823|3|103|203|303
  • Start mxgate, specify that the --source parameter is stdin, the target table is the existing csvtable, and the loading parallelism is 2. In the example, the host is mdw.
    [mxadmin@mdw ~]$ mxgate \
    --source stdin \
    --db-database demo \
    --db-master-host 127.0.0.1 \
    --db-master-port 5432 \
    --db-user mxadmin \
    --time-format unix-second \
    --delimiter "|" \
    --target csvtable \
    --parallel 2 < data.csv
  • Connect to the database to query whether the data is loaded successfully.
    
    demo=# SELECT * FROM csvtable ;
            time          | tagid | c1  | c2  | c3
    -----------------------+-------+--------------------------------------------------------------------------------------------------
    2020-10-27 05:50:23+08 |     3 | 103 | 203 | 303
    2020-10-27 05:50:22+08 |     2 | 102 | 202 | 302
    2020-10-27 05:50:21+08 |     1 | 101 | 201 | 301

(3 rows)

### 1.2 Example of MatrixGate loading JSON fields

#### 1.2.1 JSON

- Create table.

demo=# CREATE TABLE json_test( id int, j json ) USING MARS3 ORDER BY (id);

- Create data files.
 `~/json.csv`

1|"{""a"":10, ""b"":""xyz""}"

- load
Here we use the stdin mode as an example, and the other modes are the same.
The key is `--format csv`.

[mxadmin@mdw ~]$ mxgate \ --source stdin \ --db-database postgres \ --db-master-host 127.0.0.1 \ --db-master-port 7000 \ --db-user mxadmin \ --time-format raw \ --format csv \ --delimiter "|" \ --target json_test < ~/json.csv

- View loading data.

demo=# SELECT * FROM json_test; id | j ----+----------------------------------------------------------------------------------------------------------------------------- 1 | {"a":10, "b":"xyz"} (1 row)

#### 1.2.2 JSON array

- Create table.

demo=# CREATE TABLE json_array_test( id int, j json ) USING MARS3 ORDER BY (id);

- Create data files
 `~/json_array.csv`.

1|"{""{\""a\"":10, \""b\"":\""xyz\""}"",""{\""c\"": 10}""}"

- Load mxgate.

[mxadmin@mdw ~]$ mxgate \ --source stdin \ --db-database postgres \ --db-master-host 127.0.0.1 \ --db-master-port 7000 \ --db-user mxadmin \ --time-format raw \ --format csv \ --delimiter "|" \ --target json_array_test < ~/json_array.csv

- verify.

demo=# SELECT * FROM json_array_test ; id | j ----+----------------------------------------------------------------------------------------------------------------------------- 1 | {"{\"a\":10, \"b\":\"xyz\"}","{\"c\": 10}"} (1 row)

> ***Notes!***  
Because the JSON column contains special characters such as quotes, the --format parameter of mxgate must be CSV.  

<a name="watch"><br/></a>
## 2 Observe mxgate operation indicators

`watch` is a subcommand of mxgate that uses a series of indicators to describe the operation of mxgate daemon.
There are two modes of `watch`:
- Real-time observation mode, print gate's metrics every 3 seconds in a format similar to sar.
- Historical Observation Mode, you can specify any time period, any time period (such as every hour yesterday, every day last month, every month last year) to statistically import speed.

### 2.1 Real-time observation

[mxadmin@mdw ~]$ mxgate watch

The running indicators of mxgate will be collected every three seconds, and the output results are as follows
             Time          WCount          ICount        WSpeed/s        ISpeed/s  WBandWidth MB/S     BlocakItems

2022-04-28 15:20:58 14478858 14527011 2598081 2627887 2395 0 2022-04-28 15:21:01 22231035 22633254 2584059 2702081 2222 0 2022-04-28 15:21:04 30494310 30500874 2754425 2622540 3551 0 2022-04-28 15:21:07 38004210 38032956 2503300 2510694 2862 0 2022-04-28 15:21:10 46188696 46298223 2728162 2755089 2227 0 ...

The description of each indicator above can be obtained through the `--info` parameter

[mxadmin@mdw ~]$ mxgate watch --info

By default, only speed indicators will be output, and the time indicator can be observed through the --watch-lateency parameter to analyze problems.

[mxadmin@mdw ~]$ mxgate watch --watch-latency

### 2.2 Historical data observation

[mxadmin@mdw ~]$ mxgate watch --history

The average speed per hour in 24 hours as of the current time will be calculated, and the output result is as follows
            TIME RANGE                | SPEED/S  | BANDWIDTH MB/S  | BLOCK ITEMS

2022-04-28 16:00:00-2022-04-28 17:00:00 | 2208010 | 1254.48 | 0 2022-04-28 17:00:00-2022-04-28 18:00:00 | 1157920 | 1327.00 | 0 2022-04-28 18:00:00-2022-04-28 19:00:00 | 2228666 | 2162.32 | 0 2022-04-28 19:00:00-2022-04-28 20:00:00 | 1371092 | 2881.30 | 0 2022-04-28 20:00:00-2022-04-28 21:00:00 | 1575320 | 2608.20 | 0

Among them, SPEED/S, BANDWIDTH MB/S represents the imported entry speed and import bandwidth (MB/s in units),
BLOCK ITEMS represents the amount of data blocking in mxgate. This value will rise when the database consumption speed cannot keep up with the production speed of data sources (http, kafka, etc.).

You can add `--watch-start`, `--watch-end`, `--watch-duration` parameters to control the time interval and period of observation historical data.
For example

[mxadmin@mdw ~]$ mxgate watch --history --watch-start '2022-03-27 00:00:00' --watch-end '2022-04-27 00:00:00' --watch-duration '168h'

Average import speeds per week (every 168h) from March 27 to April 27
Among them, `--watch-duration` supports three units: `h`` `m`` `s`

<a name="non-stop_update_parameters"><br/></a>
## 3 Unstop update parallel write

mxgate supports modifying parallel loading parameters `--stream-prepared` and `--interval` without stopping the run:
- `--stream-prepared` represents the number of active slots when mxgate writes data to each table of YMatrix. By default, the number of slots in all tables is the same. It can be manually adjusted for a single task (Job) (only one task per table).
  - Increasing the `--stream-prepared` value often means improving the concurrency of data writing, thereby improving performance, and also occupies more memory resources.
  - Reducing the `--stream-prepared` value can reduce the use of memory resources of the data node (Segment), but the performance will also be reduced accordingly.
- `--interval` represents the working interval time of each active slot, that is, after each slot waits for `--interval` (in milliseconds, ms), the data received by mxgate during that period is sent to YMatrix.
  - Increasing the value of `--interval` means that the delay of each slot will increase every time the data is written, and the amount of data written will increase (provided that data is continuously flowing into mxgate).
  - Reduce the value of `--interval`, and the delay is reduced and the amount of data is also reduced.

In summary, in production practice, the above two parameters need to be reasonably adjusted in conjunction with different storage engines and users' requirements for real-time data writing.

Specific usage examples are as follows:

- Set the number of slots per table to 3

$ mxgate set --stream-prepared-cli 3

- Get the number of active slots per table

$ mxgate get --stream-prepared-get

- Set the interval between slots of all tables to 200ms

$ mxgate set --job-interval 200

- Get the slot of all tables Current time interval

$ mxgate get --job-interval-get

> ***Notes!***  
For the above parameters, if you want to set or get the slot number or working time of a specific table, then add `--job <name>` after the above command. Each task corresponds to a database table. The job parameter structure consists of a schema name and a table name, which means that if your specific table is called test_table and the schema name is public, then you need to add `--job public.test_table` after the existing command.

<a name="non-stop_update_table"><br/></a>
## 4 Update table structure without shutting down

During data loading, as time-series data sources become more and more abundant, the table structure set previously may no longer be applicable to the current scenario, so there is a need to modify the table structure. This section will explain how mxgate pauses data writing, reloads table meta information, and restores data writing without shutting down. The specific steps are as follows:

 * First, use the command `mxgate pause -X` to interrupt the slot of all tables to prepare for modifying the database table structure. Where the `-X` parameter is necessary, it will help interrupt the slot between mxgate and the database. If slot is not interrupted, the database table cannot be modified. In addition to `-X`, the `-S` parameter can be used to make the pause mission synchronously wait for all slot interrupts to complete before returning.

$ mxgate pause -X



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.2.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


begin to pause all jobs,please wait...

* Secondly, after interrupting all slots of the corresponding table, you can perform the operation of modifying the structure of the database corresponding to the database, such as adding several columns, deleting several columns, deleting existing tables, and recreating a new table with the same name.

> ***Notes!***  
The reconstructed table structure can be different, but the "table name" must be consistent.

 * Finally, use the command `mxgate resume -R` to restore slots of all tables and overload meta information of the data table. Where the `-R` parameter is required, `resume` and `-R` will combine to complete the reload operation.

$ mxgate resume -R



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.2.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


begin to reload all jobs,please wait...

* In particular, when multiple mxgate processes run at the same time, the `-p` parameter is required to represent the process number of the corresponding mxgate process. All the above commands are the same, for example:

$ mxgate get --job-interval-get -p 70199 --job

> ***Notes!***  
 The prerequisite for executing the overloaded command is that all slots in the corresponding table of mxgate must be paused first, otherwise an error will occur: `jobs are not paused, please pause them first`

<a name="upsert"><br/></a>
## 5 UPSERT function support

 YMatrix implements the UPSERT function of MatrixGate from v4.2.0. For details, see [Data Batch Merge Scenario (UPSERT)](/doc/5.2/datainput/upsert).  

<a name="transfer"><br/></a>
## 6 Single table migration

 YMatrix implements MatrixGate's single table migration function from v4.3.0. For details, see [Single Table Migration](/doc/5.2/maintain/migrate/mxgate).

>***Notes!***  
YMatrix not only supports single table migration, but also supports full library migration (v4.7.0 and above). For details, see [Full Library Migration](/doc/5.2/maintain/migrate/mxshift).

 <a name="log_level"><br/></a>
## 7 Non-stop update log level

Sometimes we need to turn on the debug log of mxgate to observe some key information, but turning on or off the debug log requires restarting mxgate, which is not conducive to positioning problems. Therefore, YMatrix provides the ability to dynamically change the mxgate log level:

 * When gate is running, use the command `mxgate set --log-level VERBOSE` to enable the `VERBOSE` level log with relatively complete information or `mxgate set --log-level DEBUG` to enable the `DEBUG` level log with the most complete information. When you do not need to observe the debug log, you can use `mxgate set --log-level INFO` to restore the log level to `INFO`.

<a name="eventtrigger"><br/></a>
 ## 8 mxgate event triggers

 Sometimes we need to make some DDL modifications to a table during the insertion operation by mxgate. This function allows mxgate to "back" the incoming insertion operation after receiving the DDL modification instruction, so that the DDL statement can be executed as soon as possible.

 Supported DDL operations are as follows:
 1. Clear a table or a group of tables

=# TRUNCATE TABLE ;

2. Add columns, delete columns, and modify column types for a table

=# ALTER TABLE ADD COLUMN ; =# ALTER TABLE DROP COLUMN ; =# ALTER TABLE ALTER COLUMN TYPE ;

3. Change the table name for a table

=# ALTER TABLE RENAME TO ;

4. Delete the table first and then recreate it

=# DROP TABLE ; =# CREATE TABLE <tablename(column datatype)>;

5. The above four DDL operations for partition tables

<a name="autotune"><br/></a>
## 9 mxgate automatically adjusts slot number

Typically, we can adapt the load (i.e. the amount of data to be written) of different write tasks by manually adjusting [stream-prepared parameter](/doc/5.2/tools/mxgate/parameter#streamprepared) (the number of slot sessions executed in parallel). However, in some business scenarios, there are continuous dynamic changes in the load of the write task, and this change cannot be adjusted manually.

Therefore, MatrixGate designed an automatic mechanism: automatic adaptation of the write load is completed by turning on the `--auto-tune` parameter.

This parameter can be enabled in the following ways:
- [Passing --auto-tune parameters through configuration file at startup](/doc/5.2/tools/mxgate/parameter#autotune)
- [Dynamic switch via mxgate set command](/doc/5.2/tools/mxgate/parameter#autotuneset)
- [Get adjustment status through mxgate get command](/doc/5.2/tools/mxgate/parameter#autotunestatus)

>***Notes!***  
Turning on this function can ensure the most efficient write performance in real time, but the utilization of session slot resources is not necessarily optimal.   

>***Notes!***  
If you set the automatic adjustment of slot number on and also set the [High Water Threshold] (#sourcepressure), the automatic adjustment of slot number function will only take effect if the write load exceeds the threshold.

<a name="sourcepressure"><br/></a>
## 10 mxgate Percent Quantized Write Load

mxgate supports quantification of the write load of the source data (percentage 0 to 100%). During the process of mxgate writing data, the write load of the corresponding task (Job) can be obtained through the following command:

$ mxgate get --source-pressure



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.2.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


public.t2 source-pressure = 27.26%

If there are multiple jobs, you can get the corresponding write load through the name of the job, or get the write load of multiple jobs at once:

$ mxgate get --source-pressure --job public.t2



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.2.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


public.t2 source-pressure = 15.73%

$ mxgate get --source-pressure



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.2.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


public.t2 source-pressure = 35.84% public.t1 source-pressure = 0%

After completing the percentage load quantization, you can use this metric to set a corresponding high water level threshold. If the data writing load at the source end exceeds the set threshold, the corresponding number will be displayed as **red**. At this time, you can manually adjust the slot number to reduce the write pressure. Of course, you can also use the [mxgate automatic slot number] (#autotune) function.

The threshold value needs to be set through the following command:

$ mxgate set --high-water-mark --job

For example:

$ mxgate set --high-water-mark 20 --job public.t2



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.2.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


public.t2 high-water-mark = 20.00%

Get the high water level threshold of the corresponding job through the following command:

$ mxgate get --high-water-mark-get --job public.t2



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.2.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


public.t2 high-water-mark = 20.00%

$ mxgate get --high-water-mark-get



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.2.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


public.t1 high-water-mark = 0.00% public.t2 high-water-mark = 20.00%

>***Notes!***  
If the set threshold value is exceeded, the pressure value will turn red.

If you enable the automatic adjustment slot number function of mxgate and set the corresponding high water level threshold, then the slot number will not automatically increase until the loading pressure value of the job does not exceed the set high water level threshold, and it will be useless even if mxgate attempts to increase through the automatic adjustment function**.

For example: We set a high water threshold of `75%`. When the loading pressure is `35%`, although mxgate judged that the number of slots should be increased, because the `75%` of the high water level threshold setting is not reached, the number of slots will not increase at this time, but will display the following error:

[WARN]:-[SlotsLauncher] Sources pressure = 35.21 of job(public.t2) is lower than the high water mark(75), the slot increase is not allowed

>***Notes!***  
The example values ​​are for reference only. Please set the corresponding high water level threshold based on actual business needs and current data writing speed.

If you want to remove the limit of the high water level threshold, you can use the following command:

$ mxgate set --disable-high-water-mark=true --job public.t2



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.2.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


public.t2 high-water-mark = 0.00%