High-performance data loading tool MatrixGate

Please refer to [YMatrix Data Access] for teaching videos for this course (https://www.bilibili.com/video/BV1kQ4y1C7Ar?share_source=copy_web)

MatrixGate, abbreviated as mxgate, is a high-performance data loading tool that comes with YMatrix. It is located in bin/mxgate in the YMatrix installation directory and currently supports loading data through the SDK or HTTP and STDIN API interfaces. The data format supports TEXT and CSV.
Using mxgate for data loading performance is much higher than that of the native INSERT statement. Because mxgate can communicate directly with the Segment, there is no Master single point bottleneck.

mxgate currently mainly supports the following functions:


1 MatrixGate Principle

The logic of MatrixGate loading data is shown in the following figure:

MatrixGate schematic

  1. The data acquisition system collects equipment data or receives data sent by the equipment
  2. The acquisition system continuously sends data to the MatrixGate service process mxgate in concurrent microbatch mode
  3. The mxgate process and the master node (Master) process of YMatrix communicate efficiently, communicating transactions and control information
  4. Data is sent directly to the data node (Segment) and written in parallel at high speed.

Comparison of INSERT with MatrixGate:

| Writing method | Advantages | Disadvantages | Applicable scenarios | | --- | --- | --- | | Direct INSERT | Simple interface | Low throughput performance | Low throughput performance, hundreds of thousands of data points/second | | MatrixGate | High throughput performance
Standard real-time | Additional deployment is required, with operation and maintenance costs | High throughput performance, tens of millions of data points/second |

After testing, the data import performance of MatrixGate and Influx is close to 79:1. For specific data, please refer to [Temporary Database Insertion Performance Evaluation: YMatrix is ​​78 times that of InfluxDB] (https://ymatrix.cn/article/27).

2 MatrixGate Usage

MatrixGate provides the following operating mode:

  • Service mode
  • Command line mode
  • Migration mode

The following shows how to use these two modes to pour data into the data table. The schema of the data table dest is as follows:

=# CREATE TABLE dest(
    time timestamp,
    c1 int,
    c2 text
)USING MARS3
DISTRIBUTED BY(c1)
ORDER BY(c1);

2.1 Service mode

The service mode has backend processes that are permanently resident, providing an HTTP interface to users and submitting time sequence data, which is a common way to use in production environments.

2.1.1 Generate configuration files

To use the service mode, you must first generate a configuration file and determine the database connection information, target table and other parameters.

$ mxgate config --db-database test \
            --db-master-host localhost \
            --db-master-port 5432 \
            --db-user mxadmin \
            --target public.dest \
            --time-format raw \
            --delimiter ',' \
            > mxgate.conf

As determined in the above command:

| Parameter name | Description | Value | | --- | --- | | --db-database | Database | test | | --db-master-host | Database host | localhost | | --db-master-port | Database Port | 5432 | | --db-user | Database username | mxadmin | | --target | Target Table | public.dest | | --time-format | Time format | raw (plain text) | | --delimiter | Delimiter | , |

2.1.2 Start MatrixGate

Then, start MatrixGate and specify the configuration file you just generated in the startup parameters:

$ mxgate start --config mxgate.conf
**********************************************************
 __  __       _        _       ____       _
|  \/  | __ _| |_ _ __(_)_  __/ ___| __ _| |_ ___
| |\/| |/ _` | __| '__| \ \/ / |  _ / _` | __/ _ \
| |  | | (_| | |_| |  | |>  <| |_| | (_| | ||  __/
|_|  |_|\__,_|\__|_|  |_/_/\_\\____|\__,_|\__\___|
  Version: v5.2.0
  Your Copy is Licensed to: yMatrix.cn; 2023-10-25; any
**********************************************************
Launching MatrixGate daemon...
MatrixGate daemon started successfully

2.1.3 Submit data

After the startup is successful, use the curl tool to send an HTTP request to submit the data.

Notes!
In production environments, data is submitted using HTTP libraries supported by the programming language.

Create a test data file rows_header.csv:

$ vi rows_header.csv

The content is as follows:

public.dest
2021-01-01 00:00:00,1,a1
2021-01-01 00:00:00,2,a2
2021-01-01 00:00:00,3,a3

When submitting data, the first row must specify the target table name, because the MatrixGate service may have multiple target tables.

Submit data:

$ curl http://localhost:8086/ -X POST -H 'Content-Type: text/plain' --data-binary "@rows_header.csv"

MatrixGate binds to port 8086 by default, which can be modified through configuration files.

Query the injected data:

=# SELECT * FROM dest;
        time         | c1 | c2
--------------------+-------------------------------------------------------------------------------------------------------------
 2021-01-01 00:00:00 | 11 | a11
 2021-01-01 00:00:00 | 12 | a12
 2021-01-01 00:00:00 | 13 | a13
(3 rows)

For more detailed API parameters, please refer to Document.

2.1.4 Operation and maintenance management

MatrixGate also provides other operation and maintenance commands for operation and maintenance management.

View Status

$ mxgate status
**********************************************************
 __  __       _        _       ____       _
|  \/  | __ _| |_ _ __(_)_  __/ ___| __ _| |_ ___
| |\/| |/ _` | __| '__| \ \/ / |  _ / _` | __/ _ \
| |  | | (_| | |_| |  | |>  <| |_| | (_| | ||  __/
|_|  |_|\__,_|\__|_|  |_/_/\_\\____|\__,_|\__\___|
  Version: v5.2.0
  Your Copy is Licensed to: yMatrix.cn; 2023-10-25; any
**********************************************************
PID          29429 alive
Launched At  2023-07-26 08:26:22
Up For       4 minutes 12 seconds
Binary       /opt/ymatrix/matrixdb5/bin/mxgated
Log          /home/mxadmin/gpAdminLogs/matrixgate.2023-07-26_082622-29429.log
Config       /home/mxadmin/mxgate.conf

You can see the service program running status, configuration files and log paths, which are used to track down problems.

Stop service

$ mxgate stop
**********************************************************
 __  __       _        _       ____       _
|  \/  | __ _| |_ _ __(_)_  __/ ___| __ _| |_ ___
| |\/| |/ _` | __| '__| \ \/ / |  _ / _` | __/ _ \
| |  | | (_| | |_| |  | |>  <| |_| | (_| | ||  __/
|_|  |_|\__,_|\__|_|  |_/_/\_\\____|\__,_|\__\___|
  Version: v5.2.0
  Your Copy is Licensed to: yMatrix.cn; 2023-10-25; any
**********************************************************
PID 29429 stopped

Observation Service

You can use the mxgate watch subcommand to observe the service in real time:

$ mxgate watch
**********************************************************
 __  __       _        _       ____       _
|  \/  | __ _| |_ _ __(_)_  __/ ___| __ _| |_ ___
| |\/| |/ _` | __| '__| \ \/ / |  _ / _` | __/ _ \
| |  | | (_| | |_| |  | |>  <| |_| | (_| | ||  __/
|_|  |_|\__,_|\__|_|  |_/_/\_\\____|\__,_|\__\___|
  Version:v5.2.0
  Your Copy is Licensed to: yMatrix.cn; 2023-10-25; any
**********************************************************
watch cmd will run forever until killed, you can use watch -T n to change the duration to n seconds;and you can use mxgate watch --info to get info of columns;
                 Time          WCount          ICount        WSpeed/s        ISpeed/s  WBandWidth MB/S     BlocakItems
  2022-04-28 15:20:58        14478858        14527011         2598081         2627887            2395               0
  2022-04-28 15:21:01        22231035        22633254         2584059         2702081            2222               0
  2022-04-28 15:21:04        30494310        30500874         2754425         2622540            3551               0
  2022-04-28 15:21:07        38004210        38032956         2503300         2510694            2862               0
  2022-04-28 15:21:10        46188696        46298223         2728162         2755089            2227               0
  ...

Or use mxgate watch --history to observe historical data

$ mxgate watch --history
**********************************************************
 __  __       _        _       ____       _
|  \/  | __ _| |_ _ __(_)_  __/ ___| __ _| |_ ___
| |\/| |/ _` | __| '__| \ \/ / |  _ / _` | __/ _ \
| |  | | (_| | |_| |  | |>  <| |_| | (_| | ||  __/
|_|  |_|\__,_|\__|_|  |_/_/\_\\____|\__,_|\__\___|
  Version: v5.2.0
  Your Copy is Licensed to: yMatrix.cn; 2023-10-25; any
**********************************************************
                TIME RANGE                | SPEED/S  | BANDWIDTH MB/S  | BLOCK ITEMS
  2023-07-28 16:00:00-2023-07-28 17:00:00 |  2208010 |         1254.48 |           0
  2023-07-28 17:00:00-2023-07-28 18:00:00 |  1157920 |         1327.00 |           0
  2023-07-28 18:00:00-2023-07-28 19:00:00 |  2228666 |         2162.32 |           0
  2023-07-28 19:00:00-2023-07-28 20:00:00 |  1371092 |         2881.30 |           0
  2023-07-28 20:00:00-2023-07-28 21:00:00 |  1575320 |         2608.20 |           0


2.2 Command Line Mode

Command line mode is used to pour data files into one go, and the process exits after the end.

It's still the data file just now. Remove the first row of the target table, only keep the data rows, and execute the following command:

$ cat rows.csv | mxgate --source stdin --db-database test --db-master-host localhost --db-master-port 5432 --db-user mxadmin --time-format raw --target public.dest --parallel 2  --delimiter ',' 

For more methods of file access, please refer to File Access


2.3 Migration Mode

The migration mode of mxgate is mainly used for high-speed data migration of single-table, and supports migration of data tables of other Greenplum 5, Greenplum 6, and YMatrix clusters to the current YMatrix cluster. Currently, there are three main usages:

  • Single table migration
  • Export to file
  • Filter migration

Notes!
For the specific usage of the migration mode, please refer to Single Table Migration Tool - mxgate, and for the full library migration method, please refer to Full Library Migration Tool - mxshift.

Notes!
For a complete introduction to the MatrixGate tool, please see MatrixGate.