Copy data between files and tables.
COPY table_name [(column_name [, ...])]
FROM {'filename' | PROGRAM 'command' | STDIN}
[ [ WITH ] ( option [, ...] ) ]
[ ON SEGMENT ]
COPY { table_name [(column_name [, ...])] | (query)}
TO {'filename' | PROGRAM 'command' | STDOUT}
[ [ WITH ] ( option [, ...] ) ]
[ ON SEGMENT ]
where option can be one of the following:
FORMAT format_name
OIDS [ boolean ]
FREEZE [ boolean ]
DELIMITER 'delimiter_character'
NULL 'null string'
HEADER [ boolean ]
QUOTE 'quote_character'
ESCAPE 'escape_character'
FORCE_QUOTE { ( column_name [, ...] ) | * }
FORCE_NOT_NULL ( column_name [, ...] )
FORCE_NULL ( column_name [, ...] )
ENCODING 'encoding_name'
FILL MISSING FIELDS
LOG ERRORS [ SEGMENT REJECT LIMIT count [ ROWS | PERCENT ] ]
IGNORE EXTERNAL PARTITIONS
COPY
moves data between YMatrix database tables and standard file system files. COPY TO
copies the contents of the table to a file (if copied on ON SEGMENT
, it will be copied to multiple files based on the Segment ID), while COPY FROM
copies the data from the file to the table (appends the data to anything already in the table). COPY TO
can also copy the results of SELECT
queries.
If a column list is specified, COPY
copies only the data in the specified column to or from the file. If any columns in the table are not in the column list, COPY FROM
inserts default values for those columns.
With the file name specified, COPY
instructs the YMatrix database Master host to read or write files directly from or to files. The file must be accessible to the Master host and a name must be specified from the perspective of the Master host.
When COPY
is used with the ON SEGMENT
clause, ON SEGMENT
causes the Segment to create separate Segment-oriented files that are retained on the Segment host. The filename parameter of ON SEGMENT
takes the string literal <SEGID>
(required) and uses an absolute path or <SEG_DATA_DIR>
string literal. When the COPY
operation is run, the paths to the Segment ID and Segment data directory are replaced with string literal values.
COPY TO
using a copy table (DISTRIBUTED REPLICATED
) as the source creates a file with lines from a single segment so that the target file does not contain duplicate lines. Using COPY TO
with the ON SEGMENT
clause and using the copy table as the source, create the target file on the Segment host that contains all table rows.
The ON SEGMENT
clause allows you to copy table data into files on the Segment host for operations such as migrating data between clusters or performing backups. Tools such as gpfdist
can be used to recover segmented data created by the ON SEGMENT
clause, which is very useful for high-speed data loading.
Note: It is recommended to use the ON SEGMENT
clause only for professional users.
When PROGRAM
is specified, the server executes the given command and reads or writes the program's standard input from the program's standard output. This command must be specified from the server's perspective and can be executed by the gpadmin
user.
When STDIN
or STDOUT
is specified, data is transferred through the connection between the client and the Master. STDIN
and STDOUT
cannot be used with the ON SEGMENT
clause.
If SEGMENT REJECT LIMIT
is used, the COPY FROM
operation will run in single-line error isolation mode. In this version, single-line error isolation mode is only applicable to malformed lines in the input file - for example, redundant or missing attributes, wrong data type attributes, or invalid client-encoded sequences. Constraint errors (such as a violation of NOT NULL
, CHECK
, or UNIQUE
constraints) will still be processed in "All or None" input mode. The user can specify the acceptable number of error rows (based on each segment), after which the entire COPY FROM
operation is terminated and no rows are loaded. The count of the error row is counted by Segment instead of the entire load operation. If the limit rejected by each Segment is not reached, all rows that do not contain errors are loaded and All rows of errors are discarded. To keep the error rows for further inspection, specify the LOG ERRORS
clause to capture the error log information. The error message and the row are stored in the YMatrix internal database.
After successful completion, the COPY
command returns the following command label, where count
is the number of lines copied:
COPY count
If the COPY FROM
command is run in single-line error isolation mode, if no rows are loaded due to a malformation, the following notification message is returned, where count
is the number of rows rejected:
NOTICE: Rejected count badly formatted rows.
table_name
column_name
query
SELECT
or VALUES
command whose results will be copied. Note that the query must be enclosed in parentheses.filename
PROGRAM 'command'
COPY FROM
, input is read from the command's standard output, and in COPY TO
, output is written to the command's standard input. The command must be specified from the perspective of the YMatrix database Master host system and must be executable by the YMatrix database administrator user (gpadmin
).ON SEGMENT
is specified, the command must be executable by the YMatrix database administrator user (gpadmin
) on all YMatrix database Primary Segment hosts. The command is executed by each YMatrix Segment instance. <SEGID>
is required in the command.ON SEGMENT
clause.STDIN
STDIN
does not support the ON SEGMENT
clause.STDOUT
STDOUT
does not support the ON SEGMENT
clause.boolean
TRUE
, ON
, or 1
to enable the option, and FALSE
, OFF
, or 0
to disable it. Booleans can also be omitted, in which case it is assumed to be TRUE
.FORMAT
text
, csv
(comma-separated values), or binary
. Default is text
.OIDS
FREEZE
VACUUM FREEZE
command had been run. This is intended for use as a performance option for initial data loading. Rows will only be frozen if the table to be loaded has been created or truncated in the current subtransaction, no cursors are open, and the transaction has no older snapshots.DELIMITER
NULL
\N
for text format (backslash -N), and an unquoted empty string for CSV format. For cases where you do not wish to distinguish between null values and empty strings, you can even use an empty string in text format. This option is not allowed when using binary format.COPY FROM
, any data item that matches this string will be stored as a null value, so you should ensure that the same string is used as with COPY TO
.HEADER
QUOTE
ESCAPE
FORCE_QUOTE
*
is specified, non-NULL values in all columns are quoted. This option is only allowed in COPY TO
and when using CSV format.FORCE_NOT_NULL
COPY FROM
and when using CSV format.FORCE_NULL
COPY FROM
and when using CSV format.ENCODING
encoding_name
. If omitted, the current client encoding is used.ON SEGMENT
COPY TO...ON SEGMENT
command to copy data from a table to a file, the command creates a file for each Segment instance on the Segment host. Each file contains the table data managed by the Segment instance.COPY
command does not copy data between mirror instances and Segment data files.ON SEGMENT
does not support the keywords STDIN
and STDOUT
.<SEG_DATA_DIR>
and <SEGID>
string literals are used to specify absolute paths and filenames with the following syntax:COPY table [TO|FROM] '<SEG_DATA_DIR>/gpdumpname<SEGID>_suffix' ON SEGMENT;
<SEG_DATA_DIR>
ON SEGMENT
replication. The angle brackets (<
and >
) are part of the string literal used to specify the path. When the COPY
command is executed, the string literal is replaced with the Segment path. An absolute path can be used instead of the <SEG_DATA_DIR>
string literal.<SEGID>
ON SEGMENT
replication. When ON SEGMENT
is specified, <SEGID>
is a required part of the filename. The angle brackets are part of the string literal used to specify the filename.COPY TO
, the string literal is replaced with the content ID of the Segment instance when the COPY
command is executed.COPY FROM
, specify the content ID of the Segment instance in the filename and place the file on the Segment instance host. Each Primary Segment instance on the host must have a file. When the COPY FROM
command is executed, data is copied from the file to the Segment instance.NEWLINE
LF
(line feed, 0x0A), CR
(carriage return, 0x0D), or CRLF
(carriage return plus line feed, 0x0D 0x0A). If not specified, the YMatrix database Segment will detect the newline type by looking at the first line of data received and using the first encountered newline type.CSV
FILL MISSING FIELDS
COPY FROM
for TEXT and CSV modes, specifying FILL MISSING FIELDS
will set data rows missing data fields at the end of the row to NULL (instead of reporting an error). Blank lines, fields with NOT NULL constraints, and delimiters at the end of the row will still report errors.LOG ERRORS
gp_read_error_log()
of the YMatrix database.SEGMENT REJECT LIMIT count [ROWS | PERCENT]
COPY FROM
operation in single-line error isolation mode. If the input lines are incorrectly formatted, they will be discarded as long as the rejection limit count is not reached on any YMatrix database Segment instance during the load operation. The rejection limit count can be specified as a number of rows (default) or a percentage of the total number of rows (1-100). If PERCENT
is used, each Segment starts to calculate the percentage of wrong rows only after the number of rows specified by the parameter gp_reject_percent_threshold
is processed. The default value of gp_reject_percent_threshold
is 300 rows. Constraint errors (such as a violation of NOT NULL
, CHECK
, or UNIQUE
constraints) will still be processed in "all or nothing" input mode. If the limit is not reached, all correct rows are loaded and all wrong rows are disclosed. SEGMENT REJECT LIMIT
is not triggered or specified, the YMatrix database limits the number of initial rows that may contain malformed data. If the first 1000 rows are rejected, the COPY
operation will stop and roll back. gp_initial_bad_row_limit
.IGNORE EXTERNAL PARTITIONS
COPY
can only be used with tables, not with external tables or views. However, you can do COPY (SELECT * FROM viewname) TO ...
When specifying an ON SEGMENT
clause, the COPY
command does not support specifying a SELECT
statement in the COPY TO
command. For example, this command is not supported.
COPY (SELECT * FROM testtbl) TO '/tmp/mytst<SEGID>' ON SEGMENT
COPY
only handles specific named tables; it does not copy data between child tables. Thus, for example, COPY table TO
shows data the same as SELECT * FROM ONLY table
. However, COPY (SELECT * FROM table) TO ...
can be used to dump all data in an inheritance hierarchy.
Similarly, to copy data from a partitioned table that has leaves as external tables, use an SQL query to copy the data. For example, if table my_sales
contains leaves as external tables, then this command COPY my_sales TO stdout
returns an error. This command sends data to stdout:
The BINARY
keyword causes all data to be stored/read in binary format rather than text. It is somewhat faster than the ordinary text mode, but binary-format files are less portable across computer architectures and YMatrix database versions. Also, COPY FROM
cannot run in single-row error isolation mode if the data is in binary format.
You must have SELECT
privilege on the table you read via COPY TO
, and INSERT
privilege on the table you insert into via COPY FROM
. Having column privileges on the columns listed in the command is sufficient.
The files named in the COPY
command are read or written directly by the database server, not by the client application. Therefore, they must reside on or be accessible to the YMatrix database Master host (not the client). The YMatrix database system user (the user ID the server runs as), not the client, must have access to them and the rights to read or write them. The COPY
command is only allowed to database superusers, as it allows reading or writing any file that the server has access to.
COPY FROM
will invoke any triggers and check constraints on the target table. However, it does not invoke rewrite rules. Note that in this version, constraint violations are not evaluated for single-row error isolation mode.
COPY
input and output are affected by DateStyle
. To ensure portability to other YMatrix database installations that may use non-default DateStyle
settings, set DateStyle
to ISO before using COPY TO
. It is also advisable to avoid dumping data with IntervalStyle
set to sql_standard
, as servers with different IntervalStyle
settings may misinterpret negative interval values.
Input data is interpreted according to the ENCODING
option or the current client encoding, while output data is encoded in the ENCODING
or current client encoding, even if the data does not pass through the client but is read from or written to files directly by the server.
When copying XML data from a file in text mode, the server configuration parameter xmloption
affects the validation of the copied XML data. If the value is content
(the default), XML data is validated as an XML content fragment. If the parameter value is document
, XML data is validated as an XML document. If the XML data is invalid, COPY
returns an error.
By default, COPY
stops operation on the first error. If it is COPY TO
, this should not cause a problem, but the target table in COPY FROM
has already received earlier rows. These rows will be invisible or inaccessible, but they still occupy disk space. If a failure occurs in a large COPY FROM
operation, a significant amount of disk space may be wasted. You may want to call VACUUM
to recover the wasted space. Another option is to use single-row error isolation mode to filter out error rows while still loading correct rows.
When running the COPY FROM...ON SEGMENT
command, the server configuration parameter gp_enable_segment_copy_checking
controls whether the table distribution policy (from the table DISTRIBUTED
clause) is checked when data is copied into the table. The default setting is to check the distribution policy. If a data row violates the distribution policy of the Segment instance, an error is returned.
COPY TO...ON SEGMENT
commands generate table data that can be used to recover table data via COPY FROM...ON SEGMENT
. However, when generating files with the COPY TO
command, the data restored to the Segment is distributed according to the table distribution policy. If you attempt to restore table data and the table distribution policy has been changed after running COPY FROM...ON SEGMENT
, the COPY
command may return a table distribution policy error.
Note: If you run COPY FROM...ON SEGMENT
and the server configuration parameter gp_enable_segment_copy_checking
is false
, you may need to manually redistribute the table data. See the ALTER TABLE
clause WITH REORGANIZE
.
When you specify the LOG ERRORS
clause, the YMatrix database captures errors that occur when reading external table data. You can view and manage the captured error log data.
gp_read_error_log('table_name')
. It requires SELECT privilege on table_name
. This example uses the COPY
command to display error log information for loading data into table ext_expenses
:SELECT * from gp_read_error_log('ext_expenses');
gp_truncate_error_log('table_name')
to delete error log data for table_name
. It requires table owner privileges. This example deletes the error log information captured when moving data into table ext_expenses
:SELECT gp_truncate_error_log('ext_expenses');
table_name
does not exist, the function returns FALSE.*
to delete error log data for all tables in the current database. Specify the string *.*
to delete all database error log data, including error log data that was not deleted due to previous database issues. If you specify *
, you need database owner privileges. If you specify *.*
, you need operating system superuser privileges.When a YMatrix database user who is not a superuser runs the COPY
command, the command can be controlled by a resource queue. The resource queue must be configured with the ACTIVE_STATEMENTS
parameter, which specifies the maximum number of queries that roles assigned to the queue can execute. YMatrix database does not apply cost values or memory values to COPY
commands, so resource queues with only cost or memory limits do not affect the operation of COPY
commands.
Non-superusers can only run the following types of COPY
commands:
COPY FROM
commands with STDIN
as the sourceCOPY TO
commands with STDOUT
as the targetFile formats supported by COPY
.
When using text format, the data read or written is a text file, one line per row. Columns within a row are separated by the delimiter_character
(default is tab). Column values are strings generated by the output function of each attribute's data type, or strings acceptable to the input function. Use an empty string to represent a null column. If any row in the input file has more or fewer columns than expected, COPY FROM
will raise an error. If OIDS
is specified, the OID is read or written as the first column before user data columns.
Data files have two reserved characters that have special meaning to COPY
:
\n
or 0x0a
), used to specify new lines in the data file. It is strongly recommended that applications generating COPY
data convert data newlines to UNIX-style newlines, rather than Microsoft Windows-style carriage return newlines (\r\n
or 0x0a 0x0d
).If either of these characters appears in the data, it must be escaped so that COPY
treats it as data rather than a field separator or newline.
By default, the escape character for text format files is \
(backslash), and for CSV format files, it is "
(double quote). If you want to use a different escape character, you can do so with the ESCAPE AS
clause. Ensure that the escape character you choose is not used anywhere in the data file as actual data value. You can also disable escaping in text format files by using ESCAPE 'OFF'
.
For example, suppose you have a table with three columns and you want to use COPY
to load the following three fields:
You specify the delimiter_character
as |
(pipe character) and the escape character as *
(asterisk). The formatted row in the data file would look like this:
percentage sign = % | vertical bar = *| | backslash = \
Note how the pipe character as part of the data is escaped with the asterisk character. Also note that since we are using an alternative escape character, we do not need to escape the backslash.
If any of the following characters appear as part of a column value, they must be followed by the escape character: the escape character itself, newline, carriage return, and the current delimiter. You can specify another escape character using the ESCAPE AS
clause.
This format option is used to import and export comma-separated values (CSV) files, a format used by many other programs (such as spreadsheets). It generates and recognizes common CSV escape mechanisms rather than the escape rules used by the YMatrix database standard text format.
Values in each record are separated by the DELIMITER
character. If a value contains the delimiter, the QUOTE
character, the ESCAPE
character (default is double quote), the NULL
string, a carriage return, or a newline, the entire value is prefixed and suffixed with the QUOTE
character. When outputting non-NULL values in a specific column, FORCE_QUOTE
can also be used to enforce quoting.
CSV format has no standard way to distinguish between NULL values and empty strings. YMatrix database COPY
handles this by quoting. Output NULLs as the NULL
parameter string, unquoted, while non-NULL values that match the NULL string are quoted. For example, using the default settings, NULL is written as an unquoted empty string, and empty string data values are written as double quotes (""
). Reading values follows similar rules. You can use FORCE_NOT_NULL
to prevent NULL input comparisons for specific columns.
Since backslash is not a special character in CSV format, \.
(data end marker) may also appear as a data value. To avoid any ambiguity, data values that appear as \.
on a line are automatically quoted in output and not interpreted as the data end marker in input if quoted. If you need to load a file created by another application that has unquoted single-column and may have \.
values, you may need to use that value in the input file.
Note: In CSV format, all characters are valid. Quoted spaces or any characters except the DELIMITER
are included. If importing data from a system that fills CSV rows to fixed width with trailing spaces, this may result in errors. If this occurs, you may need to preprocess the CSV file to remove trailing spaces before importing the data into the YMatrix database.
CSV format will recognize and generate CSV files with quoted values containing embedded carriage returns and newlines. Therefore, compared to text format files, files are not strictly limited to one line per row.
Note: Many programs generate odd (sometimes incorrect) CSV files, so the file format is more of a de facto standard file. Therefore, you may encounter files that cannot be imported using this mechanism, and COPY
may generate files that other programs cannot process.
The binary format option causes all data to be stored/read in binary format rather than text. It is somewhat faster than text and CSV formats, but binary-format files are less portable across computer architectures and YMatrix database versions. Similarly, binary format is very data-type specific. For example, even though it would work in text format, outputting binary data from a smallint column and reading it into an integer column will not work.
The binary file format consists of a file header, zero or more tuples containing the row data, and a file tail. The file header and data are in network byte order.
PGCOPY\n\377\r\n\0
- Note that the zero byte is a required part of the signature. (The signature is designed to facilitate the identification of files corrupted by non-8-bit-clean transmission. This signature will pass through line-termination translation filters, have zero bytes lost, have high bits lost or parity changed.)Copy the table to the client using a vertical bar (|) as the field delimiter:
COPY country TO STDOUT (DELIMITER '|');
Copy data from the file to the country table:
COPY country FROM '/home/usr1/sql/country_data';
Copy only countries whose name starts with 'A' to the file:
COPY (SELECT * FROM country WHERE country_name LIKE 'A%') TO
'/home/usr1/sql/a_list_countries.copy';
Copy data from a file into the sales table and log errors using single-line error isolation mode:
COPY sales FROM '/home/usr1/sql/sales_data' LOG ERRORS
SEGMENT REJECT LIMIT 10 ROWS;
To copy Segment data for later use, use the ON SEGMENT
clause. The COPY TO ON SEGMENT
command is used as follows:
COPY table TO '<SEG_DATA_DIR>/gpdumpname<SEGID>_suffix' ON SEGMENT;
<SEGID>
is required. However, you can replace the <SEG_DATA_DIR>
string literal in the path with an absolute path.
When passing string literals <SEG_DATA_DIR>
and <SEGID>
to COPY
, COPY
will be filled with the appropriate value when running the operation.
For example, if your mytable
contains the following Segment and Mirror:
contentid | dbid | file segment location
0 | 1 | /home/usr1/data1/gpsegdir0
0 | 3 | /home/usr1/data_mirror1/gpsegdir0
1 | 4 | /home/usr1/data2/gpsegdir1
1 | 2 | /home/usr1/data_mirror2/gpsegdir1
Run the command:
COPY mytable TO '<SEG_DATA_DIR>/gpbackup<SEGID>.txt' ON SEGMENT;
The following line will be generated:
/home/usr1/data1/gpsegdir0/gpbackup0.txt
/home/usr1/data2/gpsegdir1/gpbackup1.txt
The content ID in the first column is the identifier inserted into the file path (for example, gpsegdir0/gpbackup0.txt
above). Files are created on the Segment instead of the Master, just like in standard COPY
operations. When copying with ON SEGMENT
, no data files are created for Mirror.
If an absolute path is specified, instead of <SEG_DATA_DIR>
, for example in a statement
COPY mytable TO '/tmp/gpdir/gpbackup_<SEGID>.txt' ON SEGMENT;
The file will be placed in /tmp/gpdir
of each segment. If redistribution is required, the gpfdist
tool can also be used to restore data files generated by COPY TO
using the ON SEGMENT
option.
Note: You can use tools such as gpfdist
to restore data. The backup/restore tool is not suitable for files generated manually using COPY TO ON SEGMENT
.
This example copies data from the lineitem
table and uses the PROGRAM
clause to add data to the /tmp/lineitem_program.csv
file using the cat
utility. This file is placed on the YMatrix database Master.
COPY LINEITEM TO PROGRAM 'cat > /tmp/lineitem.csv' CSV;
This example uses the PROGRAM
and ON SEGMENT
clauses to copy data into a file on the Segment host. On the Segment host, the COPY
command replaces <SEGID>
with the Segment content ID to create a file for each Segment instance on the Segment host.
COPY LINEITEM TO PROGRAM 'cat > /tmp/lineitem_program<SEGID>.csv' ON SEGMENT CSV;
This example uses the PROGRAM
and ON SEGMENT
clauses to copy data from files on the Segment host. When copying data from a file, the COPY
command replaces <SEGID>
with the Segment content ID. On the Segment host, each Segment instance must have a file with the file name containing the Segment content ID on the Segment host.
COPY LINEITEM_4 FROM PROGRAM 'cat /tmp/lineitem_program<SEGID>.csv' ON SEGMENT CSV;
There is no COPY
statement in the SQL standard.
The following syntax was used in earlier versions of YMatrix databases and is still supported:
COPY table_name [(column_name [, ...])] FROM
{'filename' | PROGRAM 'command' | STDIN}
[ [WITH]
[ON SEGMENT]
[BINARY]
[OIDS]
[HEADER]
[DELIMITER [ AS ] 'delimiter_character']
[NULL [ AS ] 'null string']
[ESCAPE [ AS ] 'escape' | 'OFF']
[NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF']
[CSV [QUOTE [ AS ] 'quote']
[FORCE NOT NULL column_name [, ...]]
[FILL MISSING FIELDS]
[[LOG ERRORS]
SEGMENT REJECT LIMIT count [ROWS | PERCENT] ]
COPY { table_name [(column_name [, ...])] | (query)} TO {'filename' | PROGRAM 'command' | STDOUT}
[ [WITH]
[ON SEGMENT]
[BINARY]
[OIDS]
[HEADER]
[DELIMITER [ AS ] 'delimiter_character']
[NULL [ AS ] 'null string']
[ESCAPE [ AS ] 'escape' | 'OFF']
[CSV [QUOTE [ AS ] 'quote']
[FORCE QUOTE column_name [, ...]] | * ]
[IGNORE EXTERNAL PARTITIONS ]
Note that in this syntax, BINARY
and CSV
are considered independent keywords rather than parameters of the FORMAT
option.