COPY

Copy data between files and tables.

Summary

COPY table_name [(column_name [, ...])]
FROM {'filename' | PROGRAM 'command' | STDIN}
[ [ WITH ] ( option [, ...] ) ]
[ ON SEGMENT ]

COPY { table_name [(column_name [, ...])] | (query)}
     TO {'filename' | PROGRAM 'command' | STDOUT}
     [ [ WITH ] ( option [, ...] ) ]
     [ ON SEGMENT ]

where option can be one of the following:

FORMAT format_name
OIDS [ boolean ]
FREEZE [ boolean ]
DELIMITER 'delimiter_character'
NULL 'null string'
HEADER [ boolean ]
QUOTE 'quote_character'
ESCAPE 'escape_character'
FORCE_QUOTE { ( column_name [, ...] ) | * }
FORCE_NOT_NULL ( column_name [, ...] ) 
FORCE_NULL ( column_name [, ...] )
ENCODING 'encoding_name'       
FILL MISSING FIELDS
LOG ERRORS [ SEGMENT REJECT LIMIT count [ ROWS | PERCENT ] ]
IGNORE EXTERNAL PARTITIONS

Description

COPY moves data between YMatrix database tables and standard file system files. COPY TO copies the contents of the table to a file (if copied on ON SEGMENT, it will be copied to multiple files based on the Segment ID), while COPY FROM copies the data from the file to the table (appends the data to anything already in the table). COPY TO can also copy the results of SELECT queries.

If a column list is specified, COPY copies only the data in the specified column to or from the file. If any columns in the table are not in the column list, COPY FROM inserts default values for those columns.

With the file name specified, COPY instructs the YMatrix database Master host to read or write files directly from or to files. The file must be accessible to the Master host and a name must be specified from the perspective of the Master host.

When COPY is used with the ON SEGMENT clause, ON SEGMENT causes the Segment to create separate Segment-oriented files that are retained on the Segment host. The filename parameter of ON SEGMENT takes the string literal <SEGID> (required) and uses an absolute path or <SEG_DATA_DIR> string literal. When the COPY operation is run, the paths to the Segment ID and Segment data directory are replaced with string literal values.

COPY TO using a copy table (DISTRIBUTED REPLICATED) as the source creates a file with lines from a single segment so that the target file does not contain duplicate lines. Using COPY TO with the ON SEGMENT clause and using the copy table as the source, create the target file on the Segment host that contains all table rows.

The ON SEGMENT clause allows you to copy table data into files on the Segment host for operations such as migrating data between clusters or performing backups. Tools such as gpfdist can be used to recover segmented data created by the ON SEGMENT clause, which is very useful for high-speed data loading.

Note: It is recommended to use the ON SEGMENT clause only for professional users. When PROGRAM is specified, the server executes the given command and reads or writes the program's standard input from the program's standard output. This command must be specified from the server's perspective and can be executed by the gpadmin user.

When STDIN or STDOUT is specified, data is transferred through the connection between the client and the Master. STDIN and STDOUT cannot be used with the ON SEGMENT clause.

If SEGMENT REJECT LIMIT is used, the COPY FROM operation will run in single-line error isolation mode. In this version, single-line error isolation mode is only applicable to malformed lines in the input file - for example, redundant or missing attributes, wrong data type attributes, or invalid client-encoded sequences. Constraint errors (such as a violation of NOT NULL, CHECK, or UNIQUE constraints) will still be processed in "All or None" input mode. The user can specify the acceptable number of error rows (based on each segment), after which the entire COPY FROM operation is terminated and no rows are loaded. The count of the error row is counted by Segment instead of the entire load operation. If the limit rejected by each Segment is not reached, all rows that do not contain errors are loaded and All rows of errors are discarded. To keep the error rows for further inspection, specify the LOG ERRORS clause to capture the error log information. The error message and the row are stored in the YMatrix internal database.

Output

After successful completion, the COPY command returns the following command label, where count is the number of lines copied:

COPY count

If the COPY FROM command is run in single-line error isolation mode, if no rows are loaded due to a malformation, the following notification message is returned, where count is the number of rows rejected:

NOTICE: Rejected count badly formatted rows.

Parameters

table_name

  • The name of an existing table (can be schema-qualified).

column_name

  • An optional list of columns to copy. If no column list is specified, all columns of the table will be copied.
    When copying in text format, a row of data in a column of type bytea can be up to 256MB by default.

query

  • A SELECT or VALUES command whose results will be copied. Note that the query must be enclosed in parentheses.

filename

  • The pathname of the input or output file. Input filenames can be absolute or relative paths, but output filenames must be absolute paths. Windows users may need to use E'' strings and double all backslashes in the pathname.

PROGRAM 'command'

  • Specifies the command to execute. In COPY FROM, input is read from the command's standard output, and in COPY TO, output is written to the command's standard input. The command must be specified from the perspective of the YMatrix database Master host system and must be executable by the YMatrix database administrator user (gpadmin).
    The command is invoked by the shell program. When passing parameters to the shell, remove or escape any special characters that have special meaning to the shell. For security reasons, it is best to use fixed command strings, or at least avoid passing any user input in the string.
    When ON SEGMENT is specified, the command must be executable by the YMatrix database administrator user (gpadmin) on all YMatrix database Primary Segment hosts. The command is executed by each YMatrix Segment instance. <SEGID> is required in the command.
    For command syntax requirements and information on the data to be copied when specifying this clause, see the ON SEGMENT clause.

STDIN

  • Specifies that input comes from the client application. STDIN does not support the ON SEGMENT clause.

STDOUT

  • Specifies that output is sent to the client application. STDOUT does not support the ON SEGMENT clause.

boolean

  • Specifies whether to enable or disable the selected option. You can write TRUE, ON, or 1 to enable the option, and FALSE, OFF, or 0 to disable it. Booleans can also be omitted, in which case it is assumed to be TRUE.

FORMAT

  • Selects the data format to read or write: text, csv (comma-separated values), or binary. Default is text.

OIDS

  • Specifies that OIDs should be copied for each row. (If OIDS is specified for a table without OIDs, or in the case of copying a query, an error will be raised.)

FREEZE

  • Requests that the copied data consists of frozen rows, as if a VACUUM FREEZE command had been run. This is intended for use as a performance option for initial data loading. Rows will only be frozen if the table to be loaded has been created or truncated in the current subtransaction, no cursors are open, and the transaction has no older snapshots.
    Note that once the data is loaded, all other sessions will immediately be able to see it. This violates the usual MVCC visibility rules, and users specifying this option should be aware of the potential issues it may cause.

DELIMITER

  • Specifies the character used to separate columns within each row (line) of the file. Default is the tab character for text format, comma for CSV format. This must be a single-byte character. This option is not allowed when using binary format.

NULL

  • Specifies the string that represents a null value. Default is \N for text format (backslash -N), and an unquoted empty string for CSV format. For cases where you do not wish to distinguish between null values and empty strings, you can even use an empty string in text format. This option is not allowed when using binary format.
    Note: When using COPY FROM, any data item that matches this string will be stored as a null value, so you should ensure that the same string is used as with COPY TO.

HEADER

  • Specifies that the file contains a header row with the names of each column. When outputting, the first row contains the column names of the table. When inputting, the first row is ignored. This option is only allowed when using CSV format.

QUOTE

  • Specifies the quoting character to be used when quoting data values. Default is the double-quote. This must be a single-byte character. This option is only allowed when using CSV format.

ESCAPE

  • Specifies the character that should appear before a data character that matches the QUOTE value. Default is the same as the QUOTE value (so that if the quote character appears in the data, it is doubled). This must be a single-byte character. This option is only allowed when using CSV format.

FORCE_QUOTE

  • Forces quotes to be used for all non-NULL values in the specified columns. NULL outputs are never quoted. If * is specified, non-NULL values in all columns are quoted. This option is only allowed in COPY TO and when using CSV format.

FORCE_NOT_NULL

  • Do not match the values of the specified columns with empty strings. By default, empty strings are considered as empty, which means that empty values are read as zero-length strings rather than null values, even if not quoted. This option is only allowed in COPY FROM and when using CSV format.

FORCE_NULL

  • Matches the values of the specified columns with empty strings, even if they are quoted. If a match is found, the value is set to NULL. This option is only allowed in COPY FROM and when using CSV format.

ENCODING

  • Specifies that the file is encoded in encoding_name. If omitted, the current client encoding is used.

ON SEGMENT

  • Specifies individual Segment data files on the Segment host. Each file contains the table data managed by the Primary Segment instance. For example, when using the COPY TO...ON SEGMENT command to copy data from a table to a file, the command creates a file for each Segment instance on the Segment host. Each file contains the table data managed by the Segment instance.
    The COPY command does not copy data between mirror instances and Segment data files.
    ON SEGMENT does not support the keywords STDIN and STDOUT.
    The <SEG_DATA_DIR> and <SEGID> string literals are used to specify absolute paths and filenames with the following syntax:
    COPY table [TO|FROM] '<SEG_DATA_DIR>/gpdumpname<SEGID>_suffix' ON SEGMENT;
    • <SEG_DATA_DIR>
      A string literal representing the absolute path of the Segment instance data directory used for ON SEGMENT replication. The angle brackets (< and >) are part of the string literal used to specify the path. When the COPY command is executed, the string literal is replaced with the Segment path. An absolute path can be used instead of the <SEG_DATA_DIR> string literal.
    • <SEGID>
      A string literal representing the content ID number of the Segment instance to be replicated during ON SEGMENT replication. When ON SEGMENT is specified, <SEGID> is a required part of the filename. The angle brackets are part of the string literal used to specify the filename.
      When using COPY TO, the string literal is replaced with the content ID of the Segment instance when the COPY command is executed.
      When using COPY FROM, specify the content ID of the Segment instance in the filename and place the file on the Segment instance host. Each Primary Segment instance on the host must have a file. When the COPY FROM command is executed, data is copied from the file to the Segment instance.

NEWLINE

  • Specifies the newline character used in the data file - LF (line feed, 0x0A), CR (carriage return, 0x0D), or CRLF (carriage return plus line feed, 0x0D 0x0A). If not specified, the YMatrix database Segment will detect the newline type by looking at the first line of data received and using the first encountered newline type.

CSV

  • Selects comma-separated value (CSV) mode.

FILL MISSING FIELDS

  • In COPY FROM for TEXT and CSV modes, specifying FILL MISSING FIELDS will set data rows missing data fields at the end of the row to NULL (instead of reporting an error). Blank lines, fields with NOT NULL constraints, and delimiters at the end of the row will still report errors.

LOG ERRORS

  • This is an optional clause that can be preceded by the SEGMENT REJECT LIMIT clause to capture error log information about rows with malformed formats.
    Error log information is stored internally and can be accessed through the built-in SQL function gp_read_error_log() of the YMatrix database.

SEGMENT REJECT LIMIT count [ROWS | PERCENT]

  • Run the COPY FROM operation in single-line error isolation mode. If the input lines are incorrectly formatted, they will be discarded as long as the rejection limit count is not reached on any YMatrix database Segment instance during the load operation. The rejection limit count can be specified as a number of rows (default) or a percentage of the total number of rows (1-100). If PERCENT is used, each Segment starts to calculate the percentage of wrong rows only after the number of rows specified by the parameter gp_reject_percent_threshold is processed. The default value of gp_reject_percent_threshold is 300 rows. Constraint errors (such as a violation of NOT NULL, CHECK, or UNIQUE constraints) will still be processed in "all or nothing" input mode. If the limit is not reached, all correct rows are loaded and all wrong rows are disclosed.
    Note: If SEGMENT REJECT LIMIT is not triggered or specified, the YMatrix database limits the number of initial rows that may contain malformed data. If the first 1000 rows are rejected, the COPY operation will stop and roll back.
    The limit on the initial rejected row count can be changed using the YMatrix database server configuration parameter gp_initial_bad_row_limit.

IGNORE EXTERNAL PARTITIONS

  • When copying data from a partitioned table, data is not copied from leaf partitions of the external table. When data is not copied, a message is added to the log file.
    If this clause is not specified, and the YMatrix database tries to copy data from a leaf partition that is an external table, an error is returned.
    For information on specifying SQL queries to copy data from a leaf partition that is an external table, see the next section “Comments”.

Notes

COPY can only be used with tables, not with external tables or views. However, you can do COPY (SELECT * FROM viewname) TO ...

When specifying an ON SEGMENT clause, the COPY command does not support specifying a SELECT statement in the COPY TO command. For example, this command is not supported.

COPY (SELECT * FROM testtbl) TO '/tmp/mytst<SEGID>' ON SEGMENT

COPY only handles specific named tables; it does not copy data between child tables. Thus, for example, COPY table TO shows data the same as SELECT * FROM ONLY table. However, COPY (SELECT * FROM table) TO ... can be used to dump all data in an inheritance hierarchy.

Similarly, to copy data from a partitioned table that has leaves as external tables, use an SQL query to copy the data. For example, if table my_sales contains leaves as external tables, then this command COPY my_sales TO stdout returns an error. This command sends data to stdout:

The BINARY keyword causes all data to be stored/read in binary format rather than text. It is somewhat faster than the ordinary text mode, but binary-format files are less portable across computer architectures and YMatrix database versions. Also, COPY FROM cannot run in single-row error isolation mode if the data is in binary format.

You must have SELECT privilege on the table you read via COPY TO, and INSERT privilege on the table you insert into via COPY FROM. Having column privileges on the columns listed in the command is sufficient.

The files named in the COPY command are read or written directly by the database server, not by the client application. Therefore, they must reside on or be accessible to the YMatrix database Master host (not the client). The YMatrix database system user (the user ID the server runs as), not the client, must have access to them and the rights to read or write them. The COPY command is only allowed to database superusers, as it allows reading or writing any file that the server has access to.

COPY FROM will invoke any triggers and check constraints on the target table. However, it does not invoke rewrite rules. Note that in this version, constraint violations are not evaluated for single-row error isolation mode.

COPY input and output are affected by DateStyle. To ensure portability to other YMatrix database installations that may use non-default DateStyle settings, set DateStyle to ISO before using COPY TO. It is also advisable to avoid dumping data with IntervalStyle set to sql_standard, as servers with different IntervalStyle settings may misinterpret negative interval values.

Input data is interpreted according to the ENCODING option or the current client encoding, while output data is encoded in the ENCODING or current client encoding, even if the data does not pass through the client but is read from or written to files directly by the server.

When copying XML data from a file in text mode, the server configuration parameter xmloption affects the validation of the copied XML data. If the value is content (the default), XML data is validated as an XML content fragment. If the parameter value is document, XML data is validated as an XML document. If the XML data is invalid, COPY returns an error.

By default, COPY stops operation on the first error. If it is COPY TO, this should not cause a problem, but the target table in COPY FROM has already received earlier rows. These rows will be invisible or inaccessible, but they still occupy disk space. If a failure occurs in a large COPY FROM operation, a significant amount of disk space may be wasted. You may want to call VACUUM to recover the wasted space. Another option is to use single-row error isolation mode to filter out error rows while still loading correct rows.

When running the COPY FROM...ON SEGMENT command, the server configuration parameter gp_enable_segment_copy_checking controls whether the table distribution policy (from the table DISTRIBUTED clause) is checked when data is copied into the table. The default setting is to check the distribution policy. If a data row violates the distribution policy of the Segment instance, an error is returned.

COPY TO...ON SEGMENT commands generate table data that can be used to recover table data via COPY FROM...ON SEGMENT. However, when generating files with the COPY TO command, the data restored to the Segment is distributed according to the table distribution policy. If you attempt to restore table data and the table distribution policy has been changed after running COPY FROM...ON SEGMENT, the COPY command may return a table distribution policy error.

Note: If you run COPY FROM...ON SEGMENT and the server configuration parameter gp_enable_segment_copy_checking is false, you may need to manually redistribute the table data. See the ALTER TABLE clause WITH REORGANIZE.

When you specify the LOG ERRORS clause, the YMatrix database captures errors that occur when reading external table data. You can view and manage the captured error log data.

  • Use the built-in SQL function gp_read_error_log('table_name'). It requires SELECT privilege on table_name. This example uses the COPY command to display error log information for loading data into table ext_expenses:
    SELECT * from gp_read_error_log('ext_expenses');
  • If the specified table has error log data, new error log data will be appended to the existing error log data. Error log information is not copied to Mirror.
  • Use the built-in SQL function gp_truncate_error_log('table_name') to delete error log data for table_name. It requires table owner privileges. This example deletes the error log information captured when moving data into table ext_expenses:
    SELECT gp_truncate_error_log('ext_expenses');
    If table_name does not exist, the function returns FALSE.
    Specify the wildcard * to delete error log data for all tables in the current database. Specify the string *.* to delete all database error log data, including error log data that was not deleted due to previous database issues. If you specify *, you need database owner privileges. If you specify *.*, you need operating system superuser privileges.

When a YMatrix database user who is not a superuser runs the COPY command, the command can be controlled by a resource queue. The resource queue must be configured with the ACTIVE_STATEMENTS parameter, which specifies the maximum number of queries that roles assigned to the queue can execute. YMatrix database does not apply cost values or memory values to COPY commands, so resource queues with only cost or memory limits do not affect the operation of COPY commands.

Non-superusers can only run the following types of COPY commands:

  • COPY FROM commands with STDIN as the source
  • COPY TO commands with STDOUT as the target

File Formats

File formats supported by COPY.

Text Format

When using text format, the data read or written is a text file, one line per row. Columns within a row are separated by the delimiter_character (default is tab). Column values are strings generated by the output function of each attribute's data type, or strings acceptable to the input function. Use an empty string to represent a null column. If any row in the input file has more or fewer columns than expected, COPY FROM will raise an error. If OIDS is specified, the OID is read or written as the first column before user data columns.

Data files have two reserved characters that have special meaning to COPY:

  • The specified delimiter (default is tab), used to separate fields in the data file.
  • The UNIX-style newline (\n or 0x0a), used to specify new lines in the data file. It is strongly recommended that applications generating COPY data convert data newlines to UNIX-style newlines, rather than Microsoft Windows-style carriage return newlines (\r\n or 0x0a 0x0d).

If either of these characters appears in the data, it must be escaped so that COPY treats it as data rather than a field separator or newline.

By default, the escape character for text format files is \ (backslash), and for CSV format files, it is " (double quote). If you want to use a different escape character, you can do so with the ESCAPE AS clause. Ensure that the escape character you choose is not used anywhere in the data file as actual data value. You can also disable escaping in text format files by using ESCAPE 'OFF'.

For example, suppose you have a table with three columns and you want to use COPY to load the following three fields:

  • Percent sign = %
  • Pipe = |
  • Backslash = \

You specify the delimiter_character as | (pipe character) and the escape character as * (asterisk). The formatted row in the data file would look like this:

percentage sign = % | vertical bar = *| | backslash = \

Note how the pipe character as part of the data is escaped with the asterisk character. Also note that since we are using an alternative escape character, we do not need to escape the backslash.

If any of the following characters appear as part of a column value, they must be followed by the escape character: the escape character itself, newline, carriage return, and the current delimiter. You can specify another escape character using the ESCAPE AS clause.

CSV Format

This format option is used to import and export comma-separated values (CSV) files, a format used by many other programs (such as spreadsheets). It generates and recognizes common CSV escape mechanisms rather than the escape rules used by the YMatrix database standard text format.

Values in each record are separated by the DELIMITER character. If a value contains the delimiter, the QUOTE character, the ESCAPE character (default is double quote), the NULL string, a carriage return, or a newline, the entire value is prefixed and suffixed with the QUOTE character. When outputting non-NULL values in a specific column, FORCE_QUOTE can also be used to enforce quoting.

CSV format has no standard way to distinguish between NULL values and empty strings. YMatrix database COPY handles this by quoting. Output NULLs as the NULL parameter string, unquoted, while non-NULL values that match the NULL string are quoted. For example, using the default settings, NULL is written as an unquoted empty string, and empty string data values are written as double quotes (""). Reading values follows similar rules. You can use FORCE_NOT_NULL to prevent NULL input comparisons for specific columns.

Since backslash is not a special character in CSV format, \. (data end marker) may also appear as a data value. To avoid any ambiguity, data values that appear as \. on a line are automatically quoted in output and not interpreted as the data end marker in input if quoted. If you need to load a file created by another application that has unquoted single-column and may have \. values, you may need to use that value in the input file.

Note: In CSV format, all characters are valid. Quoted spaces or any characters except the DELIMITER are included. If importing data from a system that fills CSV rows to fixed width with trailing spaces, this may result in errors. If this occurs, you may need to preprocess the CSV file to remove trailing spaces before importing the data into the YMatrix database.

CSV format will recognize and generate CSV files with quoted values containing embedded carriage returns and newlines. Therefore, compared to text format files, files are not strictly limited to one line per row.

Note: Many programs generate odd (sometimes incorrect) CSV files, so the file format is more of a de facto standard file. Therefore, you may encounter files that cannot be imported using this mechanism, and COPY may generate files that other programs cannot process.

Binary Format

The binary format option causes all data to be stored/read in binary format rather than text. It is somewhat faster than text and CSV formats, but binary-format files are less portable across computer architectures and YMatrix database versions. Similarly, binary format is very data-type specific. For example, even though it would work in text format, outputting binary data from a smallint column and reading it into an integer column will not work.

The binary file format consists of a file header, zero or more tuples containing the row data, and a file tail. The file header and data are in network byte order.

  • File Header — The file header consists of a 15-byte fixed field, followed by a variable-length header extension area. The fixed fields are:
    • Signature — The 11-byte sequence PGCOPY\n\377\r\n\0 - Note that the zero byte is a required part of the signature. (The signature is designed to facilitate the identification of files corrupted by non-8-bit-clean transmission. This signature will pass through line-termination translation filters, have zero bytes lost, have high bits lost or parity changed.)
    • Identifier Field — A 32-bit integer bitmask indicating important aspects of the file format. Bits are numbered from 0 (LSB) to 31 (MSB). Note that this field is stored in network byte order (most significant byte first), as are all integer fields used in the file format. Reserved bits 16-31 flag critical file format issues; if a reader finds any unexpected bits set in this range, it should abort. Reserved bits 0-15 flag backward-compatible format issues; readers should simply ignore any unexpected bits set in this range. Currently, only one flag is defined; the rest must be zero (if the data has OIDs, bit 16 is 1, else 0).
    • File Header Extension Area Length — A 32-bit integer giving the byte length of the remainder of the file header, not including itself. Currently, it is zero, and the first tuple follows immediately. Future changes to the format may allow other data to appear in the file header. Readers should silently skip over any file header extension data they do not know how to handle. The file header extension area is envisioned to contain a series of self-identifying blocks. The flags field is not intended to tell the reader what is in the extension area. The specific design of file header extension contents is left to future versions.
  • Tuple — Each tuple starts with a 16-bit integer count of the number of fields in the tuple. (Currently, all tuples in a table will have the same count, but this may not always be the case.) Then, for each field in the tuple, there is a 32-bit long word followed by that many bytes of field data. (The length word does not include itself, and can be zero.) In the special case of -1, it indicates a NULL field value. In the NULL case, there are no value bytes.
    There is no alignment padding or any other extra data between fields.
    Currently, all data values in binary-format files are assumed to be in binary format (format code 1). It can be anticipated that future extensions may add a file header field to allow specification of per-column format codes.
    If OIDs are present in the file, the OID field follows immediately after the field count. This is an ordinary field, except that it is not counted in the field count. In particular, it has a length word - this allows handling of 4-byte and 8-byte OIDs with little difficulty, and if necessary, OID can be shown as null.
  • File Tail — The file tail consists of a 16-bit integer containing -1. This easily distinguishes it from a tuple's field count. If the field count is neither -1 nor the expected number of columns, the reader should report an error. This provides an extra check against getting out of sync with the data in some way.

Examples

Copy the table to the client using a vertical bar (|) as the field delimiter:

COPY country TO STDOUT (DELIMITER '|');

Copy data from the file to the country table:

COPY country FROM '/home/usr1/sql/country_data';

Copy only countries whose name starts with 'A' to the file:

COPY (SELECT * FROM country WHERE country_name LIKE 'A%') TO
  '/home/usr1/sql/a_list_countries.copy';

Copy data from a file into the sales table and log errors using single-line error isolation mode:

COPY sales FROM '/home/usr1/sql/sales_data' LOG ERRORS
   SEGMENT REJECT LIMIT 10 ROWS;

To copy Segment data for later use, use the ON SEGMENT clause. The COPY TO ON SEGMENT command is used as follows:

COPY table TO '<SEG_DATA_DIR>/gpdumpname<SEGID>_suffix' ON SEGMENT;

<SEGID> is required. However, you can replace the <SEG_DATA_DIR> string literal in the path with an absolute path.

When passing string literals <SEG_DATA_DIR> and <SEGID> to COPY, COPY will be filled with the appropriate value when running the operation.

For example, if your mytable contains the following Segment and Mirror:

contentid | dbid | file segment location
    0 | 1 | /home/usr1/data1/gpsegdir0
    0 | 3 | /home/usr1/data_mirror1/gpsegdir0
    1 | 4 | /home/usr1/data2/gpsegdir1
    1 | 2 | /home/usr1/data_mirror2/gpsegdir1

Run the command:

COPY mytable TO '<SEG_DATA_DIR>/gpbackup<SEGID>.txt' ON SEGMENT;

The following line will be generated:

/home/usr1/data1/gpsegdir0/gpbackup0.txt
/home/usr1/data2/gpsegdir1/gpbackup1.txt

The content ID in the first column is the identifier inserted into the file path (for example, gpsegdir0/gpbackup0.txt above). Files are created on the Segment instead of the Master, just like in standard COPY operations. When copying with ON SEGMENT, no data files are created for Mirror.

If an absolute path is specified, instead of <SEG_DATA_DIR>, for example in a statement

COPY mytable TO '/tmp/gpdir/gpbackup_<SEGID>.txt' ON SEGMENT;

The file will be placed in /tmp/gpdir of each segment. If redistribution is required, the gpfdist tool can also be used to restore data files generated by COPY TO using the ON SEGMENT option. Note: You can use tools such as gpfdist to restore data. The backup/restore tool is not suitable for files generated manually using COPY TO ON SEGMENT. This example copies data from the lineitem table and uses the PROGRAM clause to add data to the /tmp/lineitem_program.csv file using the cat utility. This file is placed on the YMatrix database Master.

COPY LINEITEM TO PROGRAM 'cat > /tmp/lineitem.csv' CSV;

This example uses the PROGRAM and ON SEGMENT clauses to copy data into a file on the Segment host. On the Segment host, the COPY command replaces <SEGID> with the Segment content ID to create a file for each Segment instance on the Segment host.

COPY LINEITEM TO PROGRAM 'cat > /tmp/lineitem_program<SEGID>.csv' ON SEGMENT CSV;

This example uses the PROGRAM and ON SEGMENT clauses to copy data from files on the Segment host. When copying data from a file, the COPY command replaces <SEGID> with the Segment content ID. On the Segment host, each Segment instance must have a file with the file name containing the Segment content ID on the Segment host.

COPY LINEITEM_4 FROM PROGRAM 'cat /tmp/lineitem_program<SEGID>.csv' ON SEGMENT CSV;

Compatibility

There is no COPY statement in the SQL standard.

The following syntax was used in earlier versions of YMatrix databases and is still supported:

COPY table_name [(column_name [, ...])] FROM
                    {'filename' | PROGRAM 'command' | STDIN}
    [ [WITH]
      [ON SEGMENT]
      [BINARY]
      [OIDS]
      [HEADER]
      [DELIMITER [ AS ] 'delimiter_character']
      [NULL [ AS ] 'null string']
      [ESCAPE [ AS ] 'escape' | 'OFF']
      [NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF']
      [CSV [QUOTE [ AS ] 'quote']
           [FORCE NOT NULL column_name [, ...]]
      [FILL MISSING FIELDS]
      [[LOG ERRORS]
      SEGMENT REJECT LIMIT count [ROWS | PERCENT] ]

COPY { table_name [(column_name [, ...])] | (query)} TO {'filename' | PROGRAM 'command' | STDOUT}
      [ [WITH]
        [ON SEGMENT]
        [BINARY]
        [OIDS]
        [HEADER]
        [DELIMITER [ AS ] 'delimiter_character']
        [NULL [ AS ] 'null string']
        [ESCAPE [ AS ] 'escape' | 'OFF']
        [CSV [QUOTE [ AS ] 'quote']
             [FORCE QUOTE column_name [, ...]] | * ]
      [IGNORE EXTERNAL PARTITIONS ]

Note that in this syntax, BINARY and CSV are considered independent keywords rather than parameters of the FORMAT option.