High-Performance Data Loading Tool MatrixGate

For the instructional video, refer to YMatrix Data Ingestion

MatrixGate, abbreviated as mxgate, is a high-performance data loading tool included with YMatrix. It resides in the YMatrix installation directory at bin/mxgate. Currently, it supports data ingestion via SDK, HTTP, and STDIN APIs. Supported data formats include TEXT and CSV.
Data loading using mxgate significantly outperforms native INSERT statements because mxgate communicates directly with Segments, eliminating the Master node as a single point of bottleneck.

MatrixGate currently supports the following features:


1 MatrixGate Architecture

The data loading workflow of MatrixGate is illustrated below:

MatrixGate Architecture Diagram

  1. A data collection system gathers device data or receives data pushed from devices.
  2. The collection system continuously sends data to the MatrixGate service process (mxgate) in concurrent micro-batches.
  3. The mxgate process efficiently communicates with the YMatrix master node (Master) for transaction and control information.
  4. Data is sent directly to the data nodes (Segments) and written in parallel at high speed.

Comparison between INSERT and MatrixGate:

Write Method Advantages Disadvantages Use Cases
Direct INSERT Simple interface Low throughput Low-throughput scenarios: hundreds of thousands of data points per second
MatrixGate High throughput
Near real-time
Requires additional deployment and operational overhead High-throughput scenarios: millions of data points per second

Benchmark tests show that MatrixGate achieves approximately 79 times the data ingestion performance of InfluxDB.

2 MatrixGate Usage

MatrixGate provides the following operation modes:

  • Service mode
  • Command-line mode
  • Migration mode

Below are examples of using these modes to load data into a table. The schema of the target table dest is as follows:

=# CREATE TABLE dest(
    time timestamp,
    c1 int,
    c2 text
)USING MARS3
DISTRIBUTED BY(c1)
ORDER BY(c1);

2.1 Service Mode

In service mode, a background process runs persistently and exposes an HTTP interface for submitting time-series data. This is the standard method used in production environments.

2.1.1 Generate Configuration File

To use service mode, first generate a configuration file specifying database connection details and target table parameters.

$ mxgate config --db-database test \
            --db-master-host localhost \
            --db-master-port 5432 \
            --db-user mxadmin \
            --target public.dest \
            --time-format raw \
            --delimiter ',' \
            > mxgate.conf

The above command sets the following parameters:

Parameter Description Value
--db-database Database name test
--db-master-host Database host localhost
--db-master-port Database port 5432
--db-user Database user mxadmin
--target Target table public.dest
--time-format Time format raw (plain text)
--delimiter Field delimiter ,

2.1.2 Start MatrixGate

Next, start MatrixGate by specifying the generated configuration file:

$ mxgate start --config mxgate.conf
******************************************************
 __  __       _        _       ____       _
|  \/  | __ _| |_ _ __(_)_  __/ ___| __ _| |_ ___
| |\/| |/ _` | __| '__| \ \/ / |  _ / _` | __/ _ \
| |  | | (_| | |_| |  | |>  <| |_| | (_| | ||  __/
|_|  |_|\__,_|\__|_|  |_/_/\_\\____|\__,_|\__\___|
  Version: v5.2.0
  Your Copy is Licensed to: yMatrix.cn; 2023-10-25; any
******************************************************
Launching MatrixGate daemon...
MatrixGate daemon started successfully

2.1.3 Submit Data

After successful startup, use the curl tool to send an HTTP request to submit data.

Note!
In production environments, use an HTTP library provided by your programming language to submit data.

Create a test data file rows_header.csv:

$ vi rows_header.csv

Contents:

public.dest
2021-01-01 00:00:00,1,a1
2021-01-01 00:00:00,2,a2
2021-01-01 00:00:00,3,a3

When submitting data, the first line must specify the target table name, as the MatrixGate service may handle multiple target tables.

Submit the data:

$ curl http://localhost:8086/ -X POST -H 'Content-Type: text/plain' --data-binary "@rows_header.csv"

By default, MatrixGate binds to port 8086. This can be changed via the configuration file.

Query the loaded data:

=# SELECT * FROM dest;
        time         | c1 | c2
---------------------+----+-----
 2021-01-01 00:00:00 | 11 | a11
 2021-01-01 00:00:00 | 12 | a12
 2021-01-01 00:00:00 | 13 | a13
(3 rows)

For more details on API parameters, refer to the documentation.

2.1.4 Operations Management

MatrixGate provides additional commands for operational management.

Check Status

$ mxgate status
******************************************************
 __  __       _        _       ____       _
|  \/  | __ _| |_ _ __(_)_  __/ ___| __ _| |_ ___
| |\/| |/ _` | __| '__| \ \/ / |  _ / _` | __/ _ \
| |  | | (_| | |_| |  | |>  <| |_| | (_| | ||  __/
|_|  |_|\__,_|\__|_|  |_/_/\_\\____|\__,_|\__\___|
  Version: v5.2.0
  Your Copy is Licensed to: yMatrix.cn; 2023-10-25; any
******************************************************
PID          29429 alive
Launched At  2023-07-26 08:26:22
Up For       4 minutes 12 seconds
Binary       /opt/ymatrix/matrixdb5/bin/mxgated
Log          /home/mxadmin/gpAdminLogs/matrixgate.2023-07-26_082622-29429.log
Config       /home/mxadmin/mxgate.conf

This output shows the service status, configuration file path, and log location, which are useful for troubleshooting.

Stop Service

$ mxgate stop
******************************************************
 __  __       _        _       ____       _
|  \/  | __ _| |_ _ __(_)_  __/ ___| __ _| |_ ___
| |\/| |/ _` | __| '__| \ \/ / |  _ / _` | __/ _ \
| |  | | (_| | |_| |  | |>  <| |_| | (_| | ||  __/
|_|  |_|\__,_|\__|_|  |_/_/\_\\____|\__,_|\__\___|
  Version: v5.2.0
  Your Copy is Licensed to: yMatrix.cn; 2023-10-25; any
******************************************************
PID 29429 stopped

Monitor Service

Use the mxgate watch subcommand to monitor the service in real time:

$ mxgate watch
******************************************************
 __  __       _        _       ____       _
|  \/  | __ _| |_ _ __(_)_  __/ ___| __ _| |_ ___
| |\/| |/ _` | __| '__| \ \/ / |  _ / _` | __/ _ \
| |  | | (_| | |_| |  | |>  <| |_| | (_ic) | ||  __/
|_|  |_|\__,_|\__|_|  |_/_/\_\\____|\__,_|\__\___|
  Version:v5.2.0
  Your Copy is Licensed to: yMatrix.cn; 2023-10-25; any
******************************************************
watch cmd will run forever until killed, you can use watch -T n to change the duration to n seconds;and you can use mxgate watch --info to get info of columns;
                 Time          WCount          ICount        WSpeed/s        ISpeed/s  WBandWidth MB/S     BlocakItems
  2022-04-28 15:20:58        14478858        14527011         2598081         2627887            2395               0
  2022-04-28 15:21:01        22231035        22633254         2584059         2702081            2222               0
  2022-04-28 15:21:04        30494310        30500874         2754425         2622540            3551               0
  2022-04-28 15:21:07        38004210        38032956         2503300         2510694            2862               0
  2022-04-28 15:21:10        46188696        46298223         2728162         2755089            2227               0
  ...

Alternatively, use mxgate watch --history to view historical statistics:

$ mxgate watch --history
******************************************************
 __  __       _        _       ____       _
|  \/  | __ _| |_ _ __(_)_  __/ ___| __ _| |_ ___
| |\/| |/ _` | __| '__| \ \/ / |  _ / _` | __/ _ \
| |  | | (_ic) | |_| |  | |>  <| |_| | (_ic) | ||  __/
|_|  |_|\__,_|\__|_|  |_/_/\_\\____|\__,_|\__\___|
  Version: v5.2.0
  Your Copy is Licensed to: yMatrix.cn; 2023-10-25; any
******************************************************
                TIME RANGE                | SPEED/S  | BANDWIDTH MB/S  | BLOCK ITEMS
  2023-07-28 16:00:00-2023-07-28 17:00:00 |  2208010 |         1254.48 |           0
  2023-07-28 17:00:00-2023-07-28 18:00:00 |  1157920 |         1327.00 |           0
  2023-07-28 18:00:00-2023-07-28 19:00:00 |  2228666 |         2162.32 |           0
  2023-07-28 19:00:00-2023-07-28 20:00:00 |  1371092 |         2881.30 |           0
  2023-07-28 20:00:00-2023-07-28 21:00:00 |  1575320 |         2608.20 |           0


2.2 Command-Line Mode

Command-line mode is used for one-time data file ingestion. The process exits after completion.

Using the same data file, remove the first line (table name), keeping only data rows, then execute:

$ cat rows.csv | mxgate --source stdin --db-database test --db-master-host localhost --db-master-port 5432 --db-user mxadmin --time-format raw --target public.dest --parallel 2  --delimiter ',' 

For more methods on file-based data ingestion, see File Ingestion.


2.3 Migration Mode

MatrixGate's migration mode enables single-table high-speed data migration. It supports migrating tables from other Greenplum 5, Greenplum 6, or YMatrix clusters into the current YMatrix cluster. The main use cases are:

  • Single-table migration
  • Export to file
  • Filtered migration

Note!
For detailed usage of migration mode, see Single-Table Migration Tool - mxgate. For full-database migration, refer to Full-Database Migration Tool - mxshift.

Note!
For a complete overview of the MatrixGate tool, see MatrixGate.