PXF Access S3

This document describes how PXF accesses S3 and gives examples.

Platform Extension Framework (PXF) uses the mechanism of Foreign Data Wrapper (FDW) to access data stored in an external source of the YMatrix database. For a detailed introduction to PXF, please refer to [PXF Chinese Documentation] (http://docs-cn.greenplum.org/v6/pxf/overview_pxf.html).

1 Steps

1.1 Configuring PXF Services

On Master, use the mxadmin user to edit the configuration file s3-site.xml of the S3 service with the path /usr/local/pxfconf/servers/s3server_online/s3-site.xml.

$ vim /usr/local/pxfconf/servers/s3server_online/s3-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <property>
        <name>fs.s3a.endpoint</name>
        <value>http:xxx</value>
    </property>
    <property>
        <name>fs.s3a.access.key</name>
        <value>xxx</value>
    </property>
    <property>
        <name>fs.s3a.secret.key</name>
        <value>xxx</value>
    </property>
    <property>
        <name>fs.s3a.fast.upload</name>
        <value>true</value>
    </property>
</configuration>

Parameter description:

  • fs.s3a.endpoint: The endpoint of the object store, in the case of private deployment, http://<IP>:<port>. If it is a public cloud, you need to specify a domain name that the network can connect to.
  • fs.s3a.access.key: access_key for object storage.
  • fs.s3a.secret.key: Access the secret_key of the object store.

1.2 Synchronize files to all nodes and restart PXF

Notes!
All of these steps must be performed on the Master using the mxadmin user.

Synchronize the PXF configuration file to all nodes in the cluster.

$ pxf cluster sync

As shown below, the PXF configuration file has been synchronized successfully.

Syncing PXF configuration files from master host to 2 segment hosts...
PXF configs synced successfully on 2 out of 2 hosts

Notes!
You need to check the corresponding directories of other nodes to check whether their configuration files are synchronized successfully.

Restart the PXF service.

$ pxf cluster restart

As shown below, it is stated that PXF restart has been successful.

Restarting PXF on 2 segment hosts...
PXF restarted successfully on 2 out of 2 hosts

The PXF instance needs to be restarted separately.

$ pxf restart

As shown below, the PXF instance restart was successfully stated.

Using CATALINA_BASE:   /usr/local/pxf-matrixdb4/pxf-service
Using CATALINA_HOME:   /usr/local/pxf-matrixdb4/pxf-service
Using CATALINA_TMPDIR: /usr/local/pxf-matrixdb4/pxf-service/temp
Using JRE_HOME:        /usr/local/jdk1805/jre
Using CLASSPATH:       /usr/local/pxf-matrixdb4/pxf-service/bin/bootstrap.jar:/usr/local/pxf-matrixdb4/pxf-service/bin/tomcat-juli.jar
Using CATALINA_PID:    /usr/local/pxf-matrixdb4/run/catalina.pid
Tomcat stopped.
Using CATALINA_BASE:   /usr/local/pxf-matrixdb4/pxf-service
Using CATALINA_HOME:   /usr/local/pxf-matrixdb4/pxf-service
Using CATALINA_TMPDIR: /usr/local/pxf-matrixdb4/pxf-service/temp
Using JRE_HOME:        /usr/local/jdk1805/jre
Using CLASSPATH:       /usr/local/pxf-matrixdb4/pxf-service/bin/bootstrap.jar:/usr/local/pxf-matrixdb4/pxf-service/bin/tomcat-juli.jar
Using CATALINA_PID:    /usr/local/pxf-matrixdb4/run/catalina.pid
Tomcat started.
Checking if tomcat is up and running...
Server: PXF Server
Checking if PXF webapp is up and running...
PXF webapp is listening on port 5888

1.3 Create a test table

Use the mxadmin user to connect to the psql client to enter the database on Master.

$ psql <databasename>

Create a pxf_fdw extension.

Notes!
YMatrix creates extensions that require the user to have superuser permissions.

=# CREATE EXTENSION pxf_fdw;

Here are two ways of creation, choose one of them.

  1. Superuser creates S3 service config needs to correspond to the folder s3server_online created above.
    =# CREATE SERVER s3server_online FOREIGN DATA WRAPPER s3_pxf_fdw OPTIONS(config 's3server_online');
  2. Create S3 services by ordinary users If you want to create an S3 service using an ordinary user, you need to add the following permissions to the ordinary user.

Notes!
Executing the GRANT command requires the user to have superuser permissions.

=# GRANT usage ON FOREIGN DATA WRAPPER s3_pxf_fdw TO <role_name>;

config needs to correspond to the folder s3server_online created above.

=# CREATE SERVER s3server_online FOREIGN DATA WRAPPER s3_pxf_fdw OPTIONS(config 's3server_online');

Create a user map.

=# CREATE USER MAPPING FOR mxadmin server s3server_online;

Create an external table.

=# CREATE FOREIGN TABLE public.test (
c1 jsonb,
c2 text,
c3 text,
c4 text,
c5 text,
c6 jsonb,
c7 jsonb
)
SERVER s3server_online
OPTIONS (
format 'csv',
resource '/<bucket_name>/<prefix>/<filename.csv>',
JSONIFY_ARRAY 'TRUE',
JSONIFY_MAP 'TRUE',
JSONIFY_RECORD 'TRUE'
);

Parameter description:

  • format: The file format stored on the object storage. The supported file formats include CSV, TXT, Parquet, ORC, Avro, etc.
  • resource: The absolute path to the storage of the file on S3.