File access

File access refers to importing existing text data files into the target data table. Text files are usually in CSV format.

The test CSV file format is as follows. The first behavior is HEADER, followed by the data row. The types of the 3 columns are timestamps, integers and strings:

time,c1,c2
2021-01-01 00:00:00,1,a1
2021-01-01 00:00:00,2,a2
2021-01-01 00:00:00,3,a3

The target library is test, and the schema of the target table dest is as follows, which is consistent with the CSV file format:

CREATE TABLE dest(
    time timestamp,
    c1 int,
    c2 text
)DISTRIBUTED BY(c1);

The following are several commonly used methods to import file content into the target table:


1. COPY

COPY is a SQL command that comes with MatrixDB, which can import data files in the master node into the target table.

First, use psql to connect to the target library, and then execute the COPY command.

[mxadmin@mdw ~]$ psql test
psql (12)
Type "help" for help.

test=# COPY dest FROM '/home/mxadmin/rows.csv' DELIMITER ',' HEADER;
COPY 3

The DELIMITER parameter specifies a delimiter; HEADER means filtering file headers when importing data.

The COPY method is relatively simple, but there are the following problems:

  • The data file must be on the master node
  • Data is inserted through master, and the same effect as batch INSERT, there are performance problems when the data volume is large

2. MatrixGate

The MatrixGate method is to use the high-speed data access tool mxgate provided by MatrixDB for data import.

Compared with COPY, mxgate performs better when the data volume is large, and data files and master can also be deployed separately.

tail -n +2 rows.csv | mxgate --source stdin --db-database test --db-master-host localhost --db-master-port 5432 --db-user mxadmin --time-format raw --target public.dest --parallel 2  --delimiter ',' 

The above command redirects the data part of rows.csv to mxgate through pipeline. mxgate accesses data into the dest table of the test library in the stdin mode.

Because mxgate does not require file headers when accessing data, use the tail -n +2 command to output from the second line.

For more methods of using mxgate, please refer to Document