MatrixGate, a high-performance data loading tool

Please refer to [MatrixDB Data Access] for the teaching video of this course (https://www.bilibili.com/video/BV1kQ4y1C7Ar?share_source=copy_web)

MatrixGate is abbreviated as mxgate, and is a high-performance data loading tool that comes with MatrixDB.

Using mxgate for data loading performance is much higher than that of native INSERT statements. Because mxgate can communicate directly with segment, there is no master single point bottleneck.

1. Comparison of INSERT and MatrixGate

| Writing method | Advantages | Disadvantages | Applicable scenarios | | --- | --- | --- | | Direct INSERT | Simple interface | Low throughput | Low throughput, hundreds of thousands of data points/second | | MatrixGate | High throughput
Standard real-time | Additional deployment is required, with operation and maintenance costs | High throughput, tens of millions of data points/second |

After testing, using MatrixGate for data import can improve performance by dozens of times compared to competitors. For specific data, please refer to [Temporary Database Insertion Performance Evaluation: MatrixDB is 78 times that of InfluxDB] (https://ymatrix.cn/article/27)

2. How to use MatrixGate

MatrixGate provides the following operating mode:

  • Service mode
  • Command line mode

The following shows how to use these two modes to pour data into the data table. The schema of the data table dest is as follows:

CREATE TABLE dest(
    time timestamp,
    c1 int,
    c2 text
)DISTRIBUTED BY(c1);

2.1 Service mode

The service mode has backend processes that are permanently resident, providing HTTP interface to users and submitting time sequence data, which is a common way to use in production environments.

2.1.1 Generate configuration files

Use the service model to first generate configuration files and determine the database connection information, target table and other parameters

mxgate config --db-database test \
            --db-master-host localhost \
            --db-master-port 5432 \
            --db-user mxadmin \
            --target public.dest \
            --time-format raw \
            --delimiter ',' \
            > mxgate.conf

As determined in the above command:

| Parameter name | Description | Value | | --- | --- | | --db-database | Database | test | | --db-master-host | Database host | localhost | | --db-master-port | Database Port | 5432 | | --db-user | Database username | mxadmin | | --target | Target Table | public.dest | | --time-format | Time format | raw (plain text) | | --delimiter | Delimiter | , |

2.1.2 Start MatrixGate

Then, start MatrixGate and specify the configuration file you just generated in the startup parameters:

mxgate start --config mxgate.conf
**********************************************************
 __  __       _        _       ____       _
|  \/  | __ _| |_ _ __(_)_  __/ ___| __ _| |_ ___
| |\/| |/ _` | __| '__| \ \/ / |  _ / _` | __/ _ \
| |  | | (_| | |_| |  | |>  <| |_| | (_| | ||  __/
|_|  |_|\__,_|\__|_|  |_/_/\_\\____|\__,_|\__\___|
  Version: 4.0.0
  Your Copy is Licensed to: yMatrix.cn; 2021-12-18; any
**********************************************************
Launching MatrixGate daemon...
MatrixGate daemon started successfully

2.1.3 Submit data

After the startup is successful, use the curl tool to send HTTP request to submit data.

In production environment, use the HTTP library supported by the programming language to submit data

The test data file rows_header.csv has been prepared. You need to specify the target table name on the first line of the test file. The specific content is as follows:

[mxadmin@sdw2 ~]$ cat rows_header.csv
public.dest
2021-01-01 00:00:00,1,a1
2021-01-01 00:00:00,2,a2
2021-01-01 00:00:00,3,a3

Submit data:

curl http://localhost:8086/ -X POST -H 'Content-Type: text/plain' --data-binary "@rows_header.csv"

MatrixGate binds to port 8086 by default, which can be modified through configuration files.

Query the injected data:

test=# select * from dest;
        time         | c1 | c2
--------------------+-------------------------------------------------------------------------------------------------------------
 2021-01-01 00:00:00 | 11 | a11
 2021-01-01 00:00:00 | 12 | a12
 2021-01-01 00:00:00 | 13 | a13
(3 rows)

2.1.4 Operation and maintenance management

MatrixGate also provides other operation and maintenance commands for operation and maintenance management.

View Status

mxgate status
**********************************************************
 __  __       _        _       ____       _
|  \/  | __ _| |_ _ __(_)_  __/ ___| __ _| |_ ___
| |\/| |/ _` | __| '__| \ \/ / |  _ / _` | __/ _ \
| |  | | (_| | |_| |  | |>  <| |_| | (_| | ||  __/
|_|  |_|\__,_|\__|_|  |_/_/\_\\____|\__,_|\__\___|
  Version: 4.0.0
  Your Copy is Licensed to: yMatrix.cn; 2021-12-18; any
**********************************************************
PID          13675 alive
Launched At  2021-09-01 14:16:24
Up For       8 minutes 43 seconds
Binary       /usr/local/matrixdb-4.0.3.community/bin/mxgated
Log          /home/mxadmin/gpAdminLogs/matrixgate.2021-09-01_141624.log
Config       /home/mxadmin/mxgate.conf

You can see the service program running status, configuration files and log paths, which are used to track down problems.

Stop service

mxgate stop
**********************************************************
 __  __       _        _       ____       _
|  \/  | __ _| |_ _ __(_)_  __/ ___| __ _| |_ ___
| |\/| |/ _` | __| '__| \ \/ / |  _ / _` | __/ _ \
| |  | | (_| | |_| |  | |>  <| |_| | (_| | ||  __/
|_|  |_|\__,_|\__|_|  |_/_/\_\\____|\__,_|\__\___|
  Version: 4.0.0
  Your Copy is Licensed to: yMatrix.cn; 2021-12-18; any
**********************************************************
PID 13675 stopped

mxgate stop can stop the service.


2.2 Command Line Mode

Command line mode is used to pour data files into one go, and the process exits after the end.

It's still the data file just now. Remove the first row of the target table, only keep the data rows, and execute the following command:

cat rows.csv | mxgate --source stdin --db-database test --db-master-host localhost --db-master-port 5432 --db-user mxadmin --time-format raw --target public.dest --parallel 2  --delimiter ',' 

For more detailed usage methods, please refer to MatrixGate.