This document describes the main features of MatrixGate.
Notes!
Programming Language Access MatrixGate Please see Data Write - Programming Language Access MatrixGate.
demo=# CREATE TABLE csvtable (time TIMESTAMP WITH TIME ZONE, tagid INT, c1 INT, c2 INT, c3 INT)
DISTRIBUTED BY (tagid);
1603777821|1|101|201|301
1603777822|2|102|202|302
1603777823|3|103|203|303
[mxadmin@mdw ~]$ mxgate \
--source stdin \
--db-database demo \
--db-master-host 127.0.0.1 \
--db-master-port 5432 \
--db-user mxadmin \
--time-format unix-second \
--delimiter "|" \
--target csvtable \
--parallel 2 < data.csv
demo=# SELECT * FROM csvtable ;
time | tagid | c1 | c2 | c3
-----------------------+-------+--------------------------------------------------------------------------------------------------
2020-10-27 05:50:23+08 | 3 | 103 | 203 | 303
2020-10-27 05:50:22+08 | 2 | 102 | 202 | 302
2020-10-27 05:50:21+08 | 1 | 101 | 201 | 301
(3 rows)
### 1.2 Example of MatrixGate loading JSON fields
#### 1.2.1 JSON
- Create table.
demo=# CREATE TABLE json_test(id int, j json);
- Create data files.
`~/json.csv`
1|"{""a"":10, ""b"":""xyz""}"
- load
Here we use the stdin mode as an example, and the other modes are the same.
The key is `--format csv`.
[mxadmin@mdw ~]$ mxgate \ --source stdin \ --db-database postgres \ --db-master-host 127.0.0.1 \ --db-master-port 7000 \ --db-user mxadmin \ --time-format raw \ --format csv \ --delimiter "|" \ --target json_test < ~/json.csv
- View loading data.
demo=# SELECT * FROM json_test; id | j ----+----------------------------------------------------------------------------------------------------------------------------- 1 | {"a":10, "b":"xyz"} (1 row)
#### 1.2.2 JSON array
- Create table.
demo=# CREATE TABLE json_array_test(id int, j _json);
- Create data files
`~/json_array.csv`.
1|"{""{\""a\"":10, \""b\"":\""xyz\""}"",""{\""c\"": 10}""}"
- Load mxgate.
[mxadmin@mdw ~]$ mxgate \ --source stdin \ --db-database postgres \ --db-master-host 127.0.0.1 \ --db-master-port 7000 \ --db-user mxadmin \ --time-format raw \ --format csv \ --delimiter "|" \ --target json_array_test < ~/json_array.csv
- verify.
demo=# SELECT * FROM json_array_test ; id | j ----+----------------------------------------------------------------------------------------------------------------------------- 1 | {"{\"a\":10, \"b\":\"xyz\"}","{\"c\": 10}"} (1 row)
> ***Notes!***
Because the JSON column contains special characters such as quotes, the --format parameter of mxgate must be CSV.
<a name="watch"><br/></a>
## 2 Observe mxgate operation indicators
`watch` is a subcommand of mxgate that uses a series of indicators to describe the operation of mxgate daemon.
There are two modes of `watch`:
- Real-time observation mode, print gate's metrics every 3 seconds in a format similar to sar.
- Historical Observation Mode, you can specify any time period, any time period (such as every hour yesterday, every day last month, every month last year) to statistically import speed.
### 2.1 Real-time observation
[mxadmin@mdw ~]$ mxgate watch
The running indicators of mxgate will be collected every three seconds, and the output results are as follows
Time WCount ICount WSpeed/s ISpeed/s WBandWidth MB/S BlocakItems
2022-04-28 15:20:58 14478858 14527011 2598081 2627887 2395 0 2022-04-28 15:21:01 22231035 22633254 2584059 2702081 2222 0 2022-04-28 15:21:04 30494310 30500874 2754425 2622540 3551 0 2022-04-28 15:21:07 38004210 38032956 2503300 2510694 2862 0 2022-04-28 15:21:10 46188696 46298223 2728162 2755089 2227 0 ...
The description of each indicator above can be obtained through the `--info` parameter
[mxadmin@mdw ~]$ mxgate watch --info
By default, only speed indicators will be output, and the time indicator can be observed through the --watch-lateency parameter to analyze problems.
[mxadmin@mdw ~]$ mxgate watch --watch-latency
### 2.2 Historical data observation
[mxadmin@mdw ~]$ mxgate watch --history
The average speed per hour in 24 hours as of the current time will be calculated, and the output result is as follows
TIME RANGE | SPEED/S | BANDWIDTH MB/S | BLOCK ITEMS
2022-04-28 16:00:00-2022-04-28 17:00:00 | 2208010 | 1254.48 | 0 2022-04-28 17:00:00-2022-04-28 18:00:00 | 1157920 | 1327.00 | 0 2022-04-28 18:00:00-2022-04-28 19:00:00 | 2228666 | 2162.32 | 0 2022-04-28 19:00:00-2022-04-28 20:00:00 | 1371092 | 2881.30 | 0 2022-04-28 20:00:00-2022-04-28 21:00:00 | 1575320 | 2608.20 | 0
Among them, SPEED/S, BANDWIDTH MB/S represents the imported entry speed and import bandwidth (MB/s in units),
BLOCK ITEMS represents the amount of data blocking in mxgate. This value will rise when the database consumption speed cannot keep up with the production speed of data sources (http, kafka, etc.).
You can add `--watch-start`, `--watch-end`, `--watch-duration` parameters to control the time interval and period of observation historical data.
For example
[mxadmin@mdw ~]$ mxgate watch --history --watch-start '2022-03-27 00:00:00' --watch-end '2022-04-27 00:00:00' --watch-duration '168h'
Average import speeds per week (every 168h) from March 27 to April 27
Among them, `--watch-duration` supports three units: `h`` `m`` `s`
<a name="non-stop_update_parameters"><br/></a>
## 3 Update parallel writing parameters without shutting down
mxgate supports modifying the relevant parameters of parallel loading without stopping the run: "interval" and "stream-prepared". "interval" represents the working time for each write connection from mxgate to the database table, and "stream-prepared" represents the number of active write connections. In mxgate logic, there is only one write connection for the same database table to perform the task of writing data at the same time. Therefore, each database table needs to have multiple connections to continuously perform its own write tasks in different time intervals, thereby ensuring high-speed and efficient writing of data. In this process, you can use the "interval" parameter to adjust the working time of each write connection, thereby improving the write rate of data and improving load performance in a targeted manner. Specific usage examples are as follows:
- `mxgate set --stream-prepared-cli 3` Set the number of write connections per table to 3

- `mxgate get --stream-prepared-get` Get the number of active write connections per table

- `mxgate set --job-interval 200` Set the time interval for write connections of all tables to 200ms

- `mxgate get --job-interval-get` Gets the current time interval for write connections to all tables

> ***Notes!***
For the above parameters, if you want to set or get the number of write connections or working hours for a specific table, then add `--job <name>` after the above command. Each task (job) corresponds to a database table. The job parameter structure consists of a schema name and a table name, which means that if your specific table is called test_table and the schema name is public, then you need to add `--job public.test_table` after the existing command.
<a name="non-stop_update_table"><br/></a>
## 4 Update table structure without shutting down
During the data loading process, you may suddenly find that driven by changing time series data sources, the previously set table structure is no longer applicable to the current scenario, and then there is a need to modify the table structure, which can be met by mxgate. This section will explain how mxgate performs a series of operations such as pausing data writing, reloading modified database table meta information, and restoring data writing without shutting down. The specific steps are as follows:
* First, use the command `mxgate pause -X` to interrupt the write connection of all tables, preparing to modify the database table structure. Where the `-X` parameter is necessary, it will help interrupt the connection between mxgate and the database. If the connection is not interrupted, the database table cannot be modified. In addition to `-X`, use the `-S` parameter to make the pause mission synchronously wait for all connection interrupts to complete before returning.

* Secondly, after interrupting all write connections to the corresponding table, you can perform the modification of the structure of the database table corresponding to the database, such as adding several columns, deleting several columns, deleting existing tables, and recreating a new table with the same name.
> ***Notes!***
The reconstructed table structure can be different, but the "table name" must be consistent.
* Finally, use the command `mxgate resume -R` to restore write connections to all tables and overload meta information of the data table. Where the `-R` parameter is required, `resume` and `-R` will combine to complete the reload operation.

* In particular, when multiple mxgate processes run at the same time, the `-p` parameter is required to represent the process number of the corresponding mxgate process, and all the above commands are the same.

> ***Notes!***
The prerequisite for executing the overloaded command is that all write connections to the corresponding table of mxgate must be suspended first, otherwise the following error will occur:

<a name="upsert"><br/></a>
## 5 UPSERT function support
YMatrix implements the UPSERT function of MatrixGate from v4.2.0. For details, see [Data Batch Merge Scenario (UPSERT)](/doc/5.0/datainput/upsert).
<a name="transfer"><br/></a>
## 6 Single table migration
YMatrix implements MatrixGate's single table migration function from v4.3.0. For details, see [Single Table Migration](/doc/5.0/maintain/migrate/mxgate).
>***Notes!***
YMatrix not only supports single table migration, but also supports full library migration (v4.7.0 and above). For details, see [Full Library Migration](/doc/5.0/maintain/migrate/mxshift).
<a name="log_level"><br/></a>
## 7 Non-stop update log level
Sometimes we need to turn on the debug log of mxgate to observe some key information, but turning on or off the debug log requires restarting mxgate, which is not conducive to positioning problems. Therefore, YMatrix provides the ability to dynamically change the mxgate log level:
* When gate is running, use the command `mxgate set --log-level VERBOSE` to enable the `VERBOSE` level log with relatively complete information or `mxgate set --log-level DEBUG` to enable the `DEBUG` level log with the most complete information. When you do not need to observe the debug log, you can use `mxgate set --log-level INFO` to restore the log level to `INFO`.