Accessing S3 with PXF

This document describes how PXF accesses data in Amazon S3 and provides usage examples.

Platform Extension Framework (PXF) uses Foreign Data Wrapper (FDW) to access data stored outside the YMatrix database. For a detailed introduction to PXF, refer to the PXF Chinese Documentation.

1 Procedure

1.1 Configure the PXF Service

On the MXMaster host, log in as the mxadmin user and edit the PXF configuration file for accessing S3: s3-site.xml, located at /usr/local/pxfconf/servers/s3server_online/s3-site.xml.

$ vim /usr/local/pxfconf/servers/s3server_online/s3-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <property>
        <name>fs.s3a.endpoint</name>
        <value>http:xxx</value>
    </property>
    <property>
        <name>fs.s3a.access.key</name>
        <value>xxx</value>
    </property>
    <property>
        <name>fs.s3a.secret.key</name>
        <value>xxx</value>
    </property>
    <property>
        <name>fs.s3a.fast.upload</name>
        <value>true</value>
    </property>
</configuration>

Parameter description:

  • fs.s3a.endpoint: The endpoint of the object storage service. For on-premises deployments, use http://<IP>:<port>. For public cloud environments, specify a reachable domain name.
  • fs.s3a.access.key: The access key used to authenticate with the object storage.
  • fs.s3a.secret.key: The secret key used to authenticate with the object storage.

1.2 Sync Configuration Files and Restart PXF

Note!
All steps in this section must be performed on the MXMaster host as the mxadmin user.

Synchronize the PXF configuration files to all nodes in the cluster:

$ pxf cluster sync

The following output indicates successful synchronization:

Syncing PXF configuration files from master host to 2 segment hosts...
PXF configs synced successfully on 2 out of 2 hosts

Note!
Verify that the configuration files have been correctly synchronized to the corresponding directories on each node.

Restart the PXF service:

$ pxf cluster restart

The following output indicates a successful restart:

Restarting PXF on 2 segment hosts...
PXF restarted successfully on 2 out of 2 hosts

Restart the local PXF instance:

$ pxf restart

The following output confirms the PXF instance has restarted successfully:

Using CATALINA_BASE:   /usr/local/pxf-matrixdb4/pxf-service
Using CATALINA_HOME:   /usr/local/pxf-matrixdb4/pxf-service
Using CATALINA_TMPDIR: /usr/local/pxf-matrixdb4/pxf-service/temp
Using JRE_HOME:        /usr/local/jdk1805/jre
Using CLASSPATH:       /usr/local/pxf-matrixdb4/pxf-service/bin/bootstrap.jar:/usr/local/pxf-matrixdb4/pxf-service/bin/tomcat-juli.jar
Using CATALINA_PID:    /usr/local/pxf-matrixdb4/run/catalina.pid
Tomcat stopped.
Using CATALINA_BASE:   /usr/local/pxf-matrixdb4/pxf-service
Using CATALINA_HOME:   /usr/local/pxf-matrixdb4/pxf-service
Using CATALINA_TMPDIR: /usr/local/pxf-matrixdb4/pxf-service/temp
Using JRE_HOME:        /usr/local/jdk1805/jre
Using CLASSPATH:       /usr/local/pxf-matrixdb4/pxf-service/bin/bootstrap.jar:/usr/local/pxf-matrixdb4/pxf-service/bin/tomcat-juli.jar
Using CATALINA_PID:    /usr/local/pxf-matrixdb4/run/catalina.pid
Tomcat started.
Checking if tomcat is up and running...
Server: PXF Server
Checking if PXF webapp is up and running...
PXF webapp is listening on port 5888

1.3 Create Test Table

On the MXMaster host, connect to the database using the psql client as the mxadmin user:

$ psql <databasename>

Create the pxf_fdw extension.

Note!
Creating extensions in YMatrix requires superuser privileges.

=# CREATE EXTENSION pxf_fdw;

Two methods are provided below for creating an S3 server. Choose one.

  1. Superuser creates the S3 server
    The server name (s3server_online) must correspond to the directory name created earlier (s3server_online).
=# CREATE SERVER s3server_online FOREIGN DATA WRAPPER s3_pxf_fdw OPTIONS(config 's3server_online');
  1. Regular user creates the S3 server
    To allow a regular user to create an S3 server, grant the necessary privilege first.

Note!
Executing the GRANT command requires superuser privileges.

=# GRANT USAGE ON FOREIGN DATA WRAPPER s3_pxf_fdw TO <role_name>;

The server name (s3server_online) must match the previously created directory name (s3server_online).

=# CREATE SERVER s3server_online FOREIGN DATA WRAPPER s3_pxf_fdw OPTIONS(config 's3server_online');

Create a user mapping:

=# CREATE USER MAPPING FOR mxadmin SERVER s3server_online;

Create an external table:

=# CREATE FOREIGN TABLE public.test (
    c1 jsonb,
    c2 text,
    c3 text,
    c4 text,
    c5 text,
    c6 jsonb,
    c7 jsonb
)
SERVER s3server_online
OPTIONS (
    format 'csv',
    resource '/<bucket_name>/<prefix>/<filename.csv>',
    JSONIFY_ARRAY 'TRUE',
    JSONIFY_MAP 'TRUE',
    JSONIFY_RECORD 'TRUE'
);

Parameter description:

  • format: The file format stored in the object storage. Supported formats include csv, text, json, parquet, avro, etc.
  • resource: The absolute path to the file in S3.