MatrixGate Main Features

This document describes the main features of MatrixGate.

Notes!
Programming Language Access MatrixGate Please see Data Write - Programming Language Access MatrixGate.


1 MatrixGate Loading Special Type

1.1 Example of MatrixGate loading CSV files

  • Create table csvtable in the demo database.
    demo=# CREATE TABLE csvtable (
         time TIMESTAMP WITH TIME ZONE,
         tagid INT,
         c1 INT,
         c2 INT,
         c3 INT
         )
         USING MARS3
         DISTRIBUTED BY (tagid)
         ORDER BY (time,tagid);
  • Edit the data load file data.csv, the content is as follows:
    1603777821|1|101|201|301
    1603777822|2|102|202|302
    1603777823|3|103|203|303
  • Start mxgate, specify that the --source parameter is stdin, the target table is the existing csvtable, and the loading parallelism is 2. In the example, the host is mdw.
    [mxadmin@mdw ~]$ mxgate \
    --source stdin \
    --db-database demo \
    --db-master-host 127.0.0.1 \
    --db-master-port 5432 \
    --db-user mxadmin \
    --time-format unix-second \
    --delimiter "|" \
    --target csvtable \
    --parallel 2 < data.csv
  • Connect to the database to query whether the data is loaded successfully.
    
    demo=# SELECT * FROM csvtable ;
            time          | tagid | c1  | c2  | c3
    -----------------------+-------+--------------------------------------------------------------------------------------------------
    2020-10-27 05:50:23+08 |     3 | 103 | 203 | 303
    2020-10-27 05:50:22+08 |     2 | 102 | 202 | 302
    2020-10-27 05:50:21+08 |     1 | 101 | 201 | 301

(3 rows)

### 1.2 Example of MatrixGate loading JSON fields

#### 1.2.1 JSON

- Create table.

demo=# CREATE TABLE json_test( id int, j json ) USING MARS3 ORDER BY (id);

- Create data files.
 `~/json.csv`

1|"{""a"":10, ""b"":""xyz""}"

- load
Here we use the stdin mode as an example, and the other modes are the same.
The key is `--format csv`.

[mxadmin@mdw ~]$ mxgate \ --source stdin \ --db-database postgres \ --db-master-host 127.0.0.1 \ --db-master-port 7000 \ --db-user mxadmin \ --time-format raw \ --format csv \ --delimiter "|" \ --target json_test < ~/json.csv

- View loading data.

demo=# SELECT * FROM json_test; id | j ----+----------------------------------------------------------------------------------------------------------------------------- 1 | {"a":10, "b":"xyz"} (1 row)

#### 1.2.2 JSON array

- Create table.

demo=# CREATE TABLE json_array_test( id int, j json ) USING MARS3 ORDER BY (id);

- Create data files
 `~/json_array.csv`.

1|"{""{\""a\"":10, \""b\"":\""xyz\""}"",""{\""c\"": 10}""}"

- Load mxgate.

[mxadmin@mdw ~]$ mxgate \ --source stdin \ --db-database postgres \ --db-master-host 127.0.0.1 \ --db-master-port 7000 \ --db-user mxadmin \ --time-format raw \ --format csv \ --delimiter "|" \ --target json_array_test < ~/json_array.csv

- verify.

demo=# SELECT * FROM json_array_test ; id | j ----+----------------------------------------------------------------------------------------------------------------------------- 1 | {"{\"a\":10, \"b\":\"xyz\"}","{\"c\": 10}"} (1 row)

> ***Notes!***  
Because the JSON column contains special characters such as quotes, the --format parameter of mxgate must be CSV.  

<a name="watch"><br/></a>
## 2 Observe mxgate operation indicators

`watch` is a subcommand of mxgate that uses a series of indicators to describe the operation of mxgate daemon.
There are two modes of `watch`:
- Real-time observation mode, print gate's metrics every 3 seconds in a format similar to sar.
- Historical Observation Mode, you can specify any time period, any time period (such as every hour yesterday, every day last month, every month last year) to statistically import speed.

### 2.1 Real-time observation

[mxadmin@mdw ~]$ mxgate watch

The running indicators of mxgate will be collected every three seconds, and the output results are as follows
             Time          WCount          ICount        WSpeed/s        ISpeed/s  WBandWidth MB/S     BlocakItems

2022-04-28 15:20:58 14478858 14527011 2598081 2627887 2395 0 2022-04-28 15:21:01 22231035 22633254 2584059 2702081 2222 0 2022-04-28 15:21:04 30494310 30500874 2754425 2622540 3551 0 2022-04-28 15:21:07 38004210 38032956 2503300 2510694 2862 0 2022-04-28 15:21:10 46188696 46298223 2728162 2755089 2227 0 ...

The description of each indicator above can be obtained through the `--info` parameter

[mxadmin@mdw ~]$ mxgate watch --info

By default, only speed indicators will be output, and the time indicator can be observed through the --watch-lateency parameter to analyze problems.

[mxadmin@mdw ~]$ mxgate watch --watch-latency

### 2.2 Historical data observation

[mxadmin@mdw ~]$ mxgate watch --history

The average speed per hour in 24 hours as of the current time will be calculated, and the output result is as follows
            TIME RANGE                | SPEED/S  | BANDWIDTH MB/S  | BLOCK ITEMS

2022-04-28 16:00:00-2022-04-28 17:00:00 | 2208010 | 1254.48 | 0 2022-04-28 17:00:00-2022-04-28 18:00:00 | 1157920 | 1327.00 | 0 2022-04-28 18:00:00-2022-04-28 19:00:00 | 2228666 | 2162.32 | 0 2022-04-28 19:00:00-2022-04-28 20:00:00 | 1371092 | 2881.30 | 0 2022-04-28 20:00:00-2022-04-28 21:00:00 | 1575320 | 2608.20 | 0

Among them, SPEED/S, BANDWIDTH MB/S represents the imported entry speed and import bandwidth (MB/s in units),
BLOCK ITEMS represents the amount of data blocking in mxgate. This value will rise when the database consumption speed cannot keep up with the production speed of data sources (http, kafka, etc.).

You can add `--watch-start`, `--watch-end`, `--watch-duration` parameters to control the time interval and period of observation historical data.
For example

[mxadmin@mdw ~]$ mxgate watch --history --watch-start '2022-03-27 00:00:00' --watch-end '2022-04-27 00:00:00' --watch-duration '168h'

Average import speeds per week (every 168h) from March 27 to April 27
Among them, `--watch-duration` supports three units: `h`` `m`` `s`

<a name="non-stop_update_parameters"><br/></a>
## 3 Update parallel writing parameters without shutting down

mxgate supports modifying the relevant parameters of parallel loading without stopping: `--interval`, `--stream-prepared`. `--interval` represents the working time for each write connection from mxgate to the database table, and `--stream-prepared` represents the number of active write connections. In mxgate logic, there is only one write connection for the same database table to perform the task of writing data at the same time. Therefore, each database table needs to have multiple connections to continuously perform its own write tasks in different time intervals, thereby ensuring high-speed and efficient writing of data. In this process, you can adjust the working time of each write connection through the `--interval` parameter, thereby improving the data write rate and improving load performance in a targeted manner. Specific usage examples are as follows:

- Set the number of write connections per table to 3

$ mxgate set --stream-prepared-cli 3



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.1.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


- Get the number of active write connections per table

$ mxgate get --stream-prepared-get



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.1.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


public.vehicle_basic_data_mars3 stream-prepared = 3

- Set the time interval for write connections of all tables to 200ms

$ mxgate set --job-interval 200



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.1.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


- Get the current time interval for write connections to all tables

$ mxgate get --job-interval-get



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.1.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


> ***Notes!***  
For the above parameters, if you want to set or get the number of write connections or working hours for a specific table, then add `--job <name>` after the above command. Each task corresponds to a database table. The job parameter structure consists of a schema name and a table name, which means that if your specific table is called test_table and the schema name is public, then you need to add `--job public.test_table` after the existing command.

<a name="non-stop_update_table"><br/></a>
## 4 Update table structure without shutting down

During the data loading process, you may suddenly find that driven by changing time series data sources, the previously set table structure is no longer applicable to the current scenario, and then there is a need to modify the table structure, which can be met by mxgate. This section will explain how mxgate performs a series of operations such as pausing data writing, reloading modified database table meta information, and restoring data writing without shutting down. The specific steps are as follows:

 * First, use the command `mxgate pause -X` to interrupt the write connection of all tables, preparing to modify the database table structure. Where the `-X` parameter is necessary, it will help interrupt the connection between mxgate and the database. If the connection is not interrupted, the database table cannot be modified. In addition to `-X`, use the `-S` parameter to make the pause mission synchronously wait for all connection interrupts to complete before returning.

$ mxgate pause -X



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.1.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


begin to pause all jobs,please wait...

* Secondly, after interrupting all write connections to the corresponding table, you can perform the modification of the structure of the database table corresponding to the database, such as adding several columns, deleting several columns, deleting existing tables, and recreating a new table with the same name.

> ***Notes!***  
The reconstructed table structure can be different, but the "table name" must be consistent.

 * Finally, use the command `mxgate resume -R` to restore write connections to all tables and overload meta information of the data table. Where the `-R` parameter is required, `resume` and `-R` will combine to complete the reload operation.

$ mxgate resume -R



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.1.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


begin to reload all jobs,please wait...

* In particular, when multiple mxgate processes run at the same time, the `-p` parameter is required to represent the process number of the corresponding mxgate process. All the above commands are the same, for example:

$ mxgate get --job-interval-get -p 70199 --job

> ***Notes!***  
 The prerequisite for executing the overloaded command is that all write connections to the corresponding table of mxgate must be paused first, otherwise an error will occur: `jobs are not paused, please pause them first`

<a name="upsert"><br/></a>
## 5 UPSERT function support

 YMatrix implements the UPSERT function of MatrixGate from v4.2.0. For details, see [Data Batch Merge Scenario (UPSERT)](/doc/5.1/datainput/upsert).  

<a name="transfer"><br/></a>
## 6 Single table migration

 YMatrix implements MatrixGate's single table migration function from v4.3.0. For details, see [Single Table Migration](/doc/5.1/maintain/migrate/mxgate).

>***Notes!***  
YMatrix not only supports single table migration, but also supports full library migration (v4.7.0 and above). For details, see [Full Library Migration](/doc/5.1/maintain/migrate/mxshift).

 <a name="log_level"><br/></a>
## 7 Non-stop update log level

Sometimes we need to turn on the debug log of mxgate to observe some key information, but turning on or off the debug log requires restarting mxgate, which is not conducive to positioning problems. Therefore, YMatrix provides the ability to dynamically change the mxgate log level:

 * When gate is running, use the command `mxgate set --log-level VERBOSE` to enable the `VERBOSE` level log with relatively complete information or `mxgate set --log-level DEBUG` to enable the `DEBUG` level log with the most complete information. When you do not need to observe the debug log, you can use `mxgate set --log-level INFO` to restore the log level to `INFO`.

<a name="eventtrigger"><br/></a>
 ## 8 mxgate event triggers

 Sometimes we need to make some DDL modifications to a table during the insertion operation by mxgate. This function allows mxgate to "back" the incoming insertion operation after receiving the DDL modification instruction, so that the DDL statement can be executed as soon as possible.

 Supported DDL operations are as follows:
 1. Clear a table or a group of tables

=# TRUNCATE TABLE ;

2. Add columns, delete columns, and modify column types for a table

=# ALTER TABLE ADD COLUMN ; =# ALTER TABLE DROP COLUMN ; =# ALTER TABLE ALTER COLUMN TYPE ;

3. Change the table name for a table

=# ALTER TABLE RENAME TO ;

4. Delete the table first and then recreate it

=# DROP TABLE ; =# CREATE TABLE <tablename(column datatype)>;

5. The above four DDL operations for partition tables

<a name="autotune"><br/></a>
## 9 mxgate Automatically adjust the number of connections

Generally, we can adapt the load (i.e. the amount of data to be written) of different write tasks (Jobs) by manually adjusting [stream-prepared parameter](/doc/5.1/tools/mxgate/parameter#streamprepared) (the number of connection sessions executed in parallel). However, in some business scenarios, there are continuous dynamic changes in the load of the write task, and this change cannot be adjusted manually.

Therefore, MatrixGate designed an automatic mechanism: automatic adaptation of the write load is completed by turning on [auto-tune parameter](/doc/5.1/tools/mxgate/parameter#autotune).

This parameter can be enabled in the following ways:
- Pass the `--auto-tune` parameter when starting mxgate
- Set in mxgate configuration file
- Dynamic switches via `mxgate set` command
   - Open: `mxgate set --auto-tune`
   - Close: `mxgate set --auto-tune=false`

>***Notes!***  
Turning on this function can ensure the most efficient write performance in real time, but it is not necessarily optimal for the utilization of session connection resources.   

>***Notes!***  
If you set to enable automatic adjustment of connections and set [High Water Threshold] (#sourcepressure), the automatic adjustment of connections function will only take effect if the write load exceeds the threshold.

<a name="sourcepressure"><br/></a>
## 10 mxgate Percent Quantized Write Load

mxgate supports quantification of the write load of the source data (percentage 0 to 100%). During the process of mxgate writing data, the write load of the corresponding task (Job) can be obtained through the following command:

$ mxgate get --source-pressure



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.1.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


public.t2 source-pressure = 27.26%

If there are multiple jobs, you can get the corresponding write load through the name of the job, or get the write load of multiple jobs at once:

$ mxgate get --source-pressure --job public.t2



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.1.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


public.t2 source-pressure = 15.73%

$ mxgate get --source-pressure



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.1.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


public.t2 source-pressure = 35.84% public.t1 source-pressure = 0%

After completing the percentage load quantization, you can use this metric to set a corresponding high water level threshold. If the data writing load at the source end exceeds the set threshold, the corresponding number will be displayed as **red**. At this time, you can manually adjust the number of connections to reduce the write pressure. Of course, you can also use the [mxgate automatic adjustment of connections] (#autotune) function.

The threshold value needs to be set through the following command:

$ mxgate set --high-water-mark --job

For example:

$ mxgate set --high-water-mark 20 --job public.t2



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.1.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


public.t2 high-water-mark = 20.00%

Get the high water level threshold of the corresponding job through the following command:

$ mxgate get --high-water-mark-get --job public.t2



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.1.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


public.t2 high-water-mark = 20.00%

$ mxgate get --high-water-mark-get



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.1.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


public.t1 high-water-mark = 0.00% public.t2 high-water-mark = 20.00%

>***Notes!***  
If the set threshold value is exceeded, the pressure value will turn red.

If you enable the automatic adjustment number of connections function of mxgate and set the corresponding high water level threshold, then the number of connections will not automatically increase until the loading pressure value of the job does not exceed the set high water level threshold, and it will be useless even if mxgate attempts to increase through the automatic adjustment function**.

For example: We set a high water threshold of `75%`. When the loading pressure is `35%`, although mxgate judged that the number of connections should be increased, because the `75%` of the high water level threshold setting did not reach, the number of connections will not increase at this time, but will display the following error:

[WARN]:-[SlotsLauncher] Sources pressure = 35.21 of job(public.t2) is lower than the high water mark(75), the slot increase is not allowed

>***Notes!***  
The example values ​​are for reference only. Please set the corresponding high water level threshold based on actual business needs and current data writing speed.

If you want to remove the limit of the high water level threshold, you can use the following command:

$ mxgate set --disable-high-water-mark=true --job public.t2



| \/ | | | _ () / _| | | __ | |\/| |/ ` | | '| \ \/ / | / ` | / \ | | | | (| | || | | |> <| || | (_| | || / || ||_,|_|| |//_\|\,_|\_| Version: v5.1.0 Your Copy is Licensed to: yMatrix.cn; 2023-10-26; any


public.t2 high-water-mark = 0.00%