PXF FAQ

This document describes PXF-related frequently asked questions.


1 remote component error, Failed connect to localhost:5888; Connection refused (libchurl.c:950)


After PXF deployment, an error is reported when accessing HDFS:

remote component error,Failed connect to localhost:5888; Connection refused (libchurl.c:950)

Solution

  1. The way PXF accesses files requires the PXF server to be enabled on the Master node, but the data file needs to be on the Segment PXF.

  2. pxf/servers/core-site.xml and hdfs-site.xml must be the same as the Hadoop configuration file.

  3. pxf/servers/core-site.xml configure user access rights.

  4. The username and group of the file on Hadoop must be consistent with the specified pxf/core-site.xml.


2 When a file is entered into the library, a certain field contains a newline character. Divide a line of data into two lines, and then divide it with a separator, which will cause the data to be inconsistent with the number of fields. That is to say, there are two in a line of data \n One is in the middle and the other is at the end, but the middle one cannot be treated as a newline character.


Solution

  1. Add escape 'off' to the option.

  2. Use format 'text:multi'.


3 Can PXF access S3 recurse all files under the current directory, or all subdirectories?


OK.

Prerequisites

PXF can access S3 normally.

Solution

Copy the aa.csv file and name it ab.csv and upload it to bucket. The file path is ymatrix/test. Currently, the ymatrix/test files have aa.csv and ab.csv. The number of data strips in ymatrix/test are 1,000.

  1. Create an external table
    DROP FOREIGN TABLE public.chen_test ;
    CREATE FOREIGN TABLE public.chen_test (
     c1 text,
     c2 text,
     c3 text
    )
    SERVER s3server_online
    OPTIONS (
     format 'csv',
     resource 'ymatrix/test/a*.csv',
       JSONIFY_ARRAY 'TRUE',
       JSONIFY_MAP 'TRUE',
       JSONIFY_RECORD 'TRUE'
    );
  2. Check the number of data strips
    SELECT count(*) FROM chen_test ;
    count 
    -------
    2000
    (1 row)

    Use the wildcard * to recurse all files in all subdirectories under a directory.

Note! The file format that can be matched by wildcard characters must be the same, otherwise an error will be reported.


4 Can PXF read files on S3 storage? How to read?


OK.

When files in TEXT, CSV, PARQUET, and JSON formats are automatically sliced ​​according to the size of 128MB slice and read in parallel by the corresponding segment according to the slice.