CREATE TABLE

Define a new table.

Summary

CREATE [ [GLOBAL | LOCAL] {TEMPORARY | TEMP } | UNLOGGED] TABLE [IF NOT EXISTS]
    table_name (
    { column_name data_type [ COLLATE collation ] [column_constraint [ ... ] ]
[ ENCODING ( storage_directive [, ...] ) ]
    | table_constraint
    | LIKE source_table [ like_option ... ] }
    | [ column_reference_storage_directive [, ...]
    [, ... ]
] )
[ INHERITS ( parent_table [, ... ] ) ]
[ USING { MARS2 | MARS3} ]
[ WITH ( storage_parameter [=value] [, ... ] ) ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE tablespace_name ]
[ DISTRIBUTED BY (column [opclass], [ ... ] ) 
       | DISTRIBUTED RANDOMLY | DISTRIBUTED REPLICATED ]

{ --partitioned table using SUBPARTITION TEMPLATE
[ PARTITION BY partition_type (column) 
  {  [ SUBPARTITION BY partition_type (column1) 
       SUBPARTITION TEMPLATE ( template_spec ) ]
          [ SUBPARTITION BY partition_type (column2) 
            SUBPARTITION TEMPLATE ( template_spec ) ]
              [...]  }
  ( partition_spec ) ]
} |

{ -- partitioned table without SUBPARTITION TEMPLATE
[ PARTITION BY partition_type (column)
   [ SUBPARTITION BY partition_type (column1) ]
      [ SUBPARTITION BY partition_type (column2) ]
         [...]
  ( partition_spec
     [ ( subpartition_spec_column1
          [ ( subpartition_spec_column2
               [...] ) ] ) ],
  [ partition_spec
     [ ( subpartition_spec_column1
        [ ( subpartition_spec_column2
             [...] ) ] ) ], ]
    [...]
  ) ]
}

CREATE [ [GLOBAL | LOCAL] {TEMPORARY | TEMP} | UNLOGGED ] TABLE [IF NOT EXISTS] 
   table_name
    OF type_name [ (
  { column_name WITH OPTIONS [ column_constraint [ ... ] ]
    | table_constraint } 
    [, ... ]
) ]
[ WITH ( storage_parameter [=value] [, ... ] ) ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE tablespace_name ]

where column_constraint is:

[ CONSTRAINT constraint_name]
{ NOT NULL
  | NULL
  | CHECK ( expression ) [ NO INHERIT ]
  | DEFAULT default_expr
  | UNIQUE index_parameters
  | PRIMARY KEY index_parameters
  | REFERENCES reftable [ ( refcolumn ) ] 
      [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ]  
      [ ON DELETE key_action ] [ ON UPDATE key_action ] }
[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]

table_constraint is:

[ CONSTRAINT constraint_name ]
{ CHECK ( expression ) [ NO INHERIT ]
  | UNIQUE ( column_name [, ... ] ) index_parameters
  | PRIMARY KEY ( column_name [, ... ] ) index_parameters
  | FOREIGN KEY ( column_name [, ... ] ) 
      REFERENCES reftable [ ( refcolumn [, ... ] ) ]
      [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ] 
      [ ON DELETE key_action ] [ ON UPDATE key_action ] }
[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]

like_option is:

{INCLUDING|EXCLUDING} {DEFAULTS|CONSTRAINTS|INDEXES|STORAGE|COMMENTS|ALL}

The index_parameters constraints in UNIQUE and PRIMARY KEY are:

[ WITH ( storage_parameter [=value] [, ... ] ) ]
[ USING INDEX TABLESPACE tablespace_name ]

The storage_directive of the column is:

compresstype={ZLIB|ZSTD|QUICKLZ|RLE_TYPE|NONE|MXCUSTOM}
    [compresslevel={0-9}]
    [blocksize={8192-2097152} ]
    // When compressiontype=MXCUSTOM, you need to specify the specific encodechain
    [encodechain={LZ4|ZSTD|DELTADELTA(缩放位数)|DELTAZIGZAG|GORILLA|GORILLA2|FLOATINT|FDS|AUTO}]

The storage_parameter of the table is:

   appendoptimized={TRUE|FALSE}
   blocksize={8192-2097152}
   orientation={COLUMN|ROW}
   checksum={TRUE|FALSE}

   // The following are the relevant parameters of the encoding chain
   compresstype={ZLIB|ZSTD|QUICKLZ|RLE_TYPE|NONE|MXCUSTOM}
   // When compressiontype=MXCUSTOM, you need to specify the specific encodechain
   encodechain={LZ4|ZSTD|DELTADELTA(缩放位数)|DELTAZIGZAG|GORILLA|GORILLA2|FLOATINT|FDS|AUTO}
   /* When compresstype=MXCUSTOM and encodechain=AUTO, adaptive encoding mode can be specified at the table level:     
    * automode=1 means compression rate is preferred, automode=2 means speed is preferred
    */
   automode={1|2}
   compresslevel={0-9}

   fillfactor={10-100}
   [oids=FALSE]

   // The following are the parameters related to the MARS2 table
   // Used to control all MARS2 tables, how many Runs can be triggered by L0 (v4.5.0 starts to support)
   sortheap_automerge_threshold={10-2048}
   // Used to control how many Runs are reached in MARS2 single table L0 to trigger merges (v4.5.0 is supported)
   level0_merge_threshold={1-2048}
   // Control the size of the MARS2 single table L0 -> L1 upgrade. When L0 merges, the result will be upgraded to L1 if it exceeds this size.
    *(v4.5.0 is supported)
    */
   level0_upgrade_size={1-10000}
   /* Control the size of the MARS2 single table L1 -> L2 upgrade. When the result of the L1 merge exceeds this size, it will be upgraded to L2.
    *(v4.5.0 is supported)
    */
   level1_upgrade_size={1-10000}
   // Controls the sorted memory size of MARS2 forms individual inserts, and if the insert target table is a partitioned table, they will share this size (v4.5.0 starts to support)
   sortheap_sort_mem={128-2147483647}
   // Controls at least how much sorted memory is allocated for a single MARS2 partition table (v4.5.0 is supported)
   sortheap_sort_mem_core={128-2147483647}
   /* Compression threshold. Used to control how many tuples in each column of the MARS2 single table are compressed at one time.
    * It is the upper limit of the number of compressed Tuples in the same unit (v5.0.0 starts to support)
    */
   compress_threshold={1-100000}

   // The following are the parameters related to the MARS3 table (all supported starting from v5.1.0)
   /* rowstore_size is used to control when L0 Run switches in MARS3 tables. When the data size exceeds this value (in MB), it will switch to the next Run;
    * compress_threshold is same as MARS2; prefer_load_mode is used to specify the loading mode of data in MARS3. normal means normal mode,
    * The data will be stored in the disk according to the row, and then gradually converted into column storage as more data is written. bluk represents batch loading mode, and the data will be converted into column storage and then dropped into disk.
    * When the amount of data inserted in a batch exceeds rowstore_size, it will be converted into multiple columns in multiple times;
    * level_size_amplifier The enlargement coefficient used to specify the Level size
    */
   compress_threshold={1-100000},mars3options='rowstore_size={8-1024},prefer_load_mode={NORMAL|BLUK},level_size_amplifier={1-1000}'
   // The following are the parameters related to the MARS3 table downgrade storage function (all supported starting from v5.2.0)
   /* ttl_interval Used to specify the threshold for data transition from hot to cold. The unit must be specified manually, such as “2 H,” which means that data from two hours ago is defined as cold data and will be automatically downgraded;
    * ttl_interval currently supports d (days)/H (hours) units;
    * ttl_space referers to the corresponding tablespace, and cannot be created if it does not exist;
    * ttl_interval and ttl_space both need to be declared when creating tables or set with ALTER statements on existing tables.
    */
   ttl_interval={1-INT_MAX},ttl_space={}

key_action is:

    ON DELETE
   | ON UPDATE
   | NO ACTION
   | RESTRICT
   | CASCADE
   | SET NULL
   | SET DEFAULT

partition_type is:

    LIST | RANGE

partition_specification is:

partition_element [, ...]

partition_element is:

   DEFAULT PARTITION
                name
   | [PARTITION name] VALUES (list_value [,...] )
   | [PARTITION name]
      START ([datatype] 'start_value') [INCLUSIVE | EXCLUSIVE]
      [ END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] ]
      [ EVERY ([datatype] [number |INTERVAL] 'interval_value') ]
   | [PARTITION name]
      END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE]
      [ EVERY ([datatype] [number |INTERVAL] 'interval_value') ]
[ WITH ( partition_storage_parameter=value [, ... ] ) ]
[ column_reference_storage_directive [, ...] ]
[ TABLESPACE tablespace ]

Where subpartition_spec or template_spec is:

subpartition_element [, ...]

Subpartition_element is:

   DEFAULT SUBPARTITION name
    | [SUBPARTITION name] VALUES (list_value [,...] )
    | [SUBPARTITION name]
       START ([datatype] 'start_value') [INCLUSIVE | EXCLUSIVE]
       [ END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] ]
       [ EVERY ([datatype] [number |INTERVAL] 'interval_value') ]
    | [SUBPARTITION name]
       END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE]
       [ EVERY ([datatype] [number |INTERVAL] 'interval_value') ]
[ WITH ( partition_storage_parameter=value [, ... ] ) ]
[ column_reference_storage_directive [, ...] ]
[ TABLESPACE tablespace ]

The partition_storage_parameter of the partition is:

   appendoptimized={TRUE|FALSE}
   blocksize={8192-2097152}
   orientation={COLUMN|ROW}
   checksum={TRUE|FALSE}

   // The following are the relevant parameters of the encoding chain
   compresstype={ZLIB|ZSTD|QUICKLZ|RLE_TYPE|NONE|MXCUSTOM}
   // When compressiontype=MXCUSTOM, you need to specify the specific encodechain
   encodechain={LZ4|ZSTD|DELTADELTA(缩放位数)|DELTAZIGZAG|GORILLA|GORILLA2|FLOATINT|FDS|AUTO}
   /* 当 compresstype=MXCUSTOM,且 encodechain=AUTO 时,支持在表级别指定自适应编码模式:     
    * automode=1 means compression rate is preferred, automode=2 means speed is preferred
    */
   automode={1|2}
   compresslevel={0-9}

   fillfactor={10-100}
   [oids=FALSE]

   // The following are the parameters related to the MARS2 table
   // Used to control all MARS2 tables, how many Runs can be triggered by L0 (v4.5.0 starts to support)
   sortheap_automerge_threshold={10-2048}
   // Used to control how many Runs are reached in MARS2 single table L0 to trigger merges (v4.5.0 is supported)
   level0_merge_threshold={1-2048}
   // Control the size of the MARS2 single table L0 -> L1 upgrade. When L0 merges, the result will be upgraded to L1 if it exceeds this size.
    *(v4.5.0 is supported)
    */
   level0_upgrade_size={1-10000}
   /* Control the size of the MARS2 single table L1 -> L2 upgrade. When the result of the L1 merge exceeds this size, it will be upgraded to L2.
    *(v4.5.0 is supported)
    */
   level1_upgrade_size={1-10000}
   // Controls the sorted memory size of MARS2 forms individual inserts, and if the insert target table is a partitioned table, they will share this size (v4.5.0 starts to support)
   sortheap_sort_mem={128-2147483647}
   // Controls at least how much sorted memory is allocated for a single MARS2 partition table (v4.5.0 is supported)
   sortheap_sort_mem_core={128-2147483647}
   /* Compression threshold. Used to control how many tuples in each column of the MARS2 single table are compressed at one time.
    * It is the upper limit of the number of compressed Tuples in the same unit (v5.0.0 starts to support)
    */
   compress_threshold={1-100000}

   // The following are the parameters related to the MARS3 table (all supported starting from v5.1.0)
   /* rowstore_size Used to control when L0 Run in the MARS3 table switches. When the data size exceeds this value (in MB), it will switch to the next Run;
    * compress_threshold is same as MARS2; prefer_load_mode is used to specify the loading mode of data in MARS3. normal means normal mode,
    * The data will be stored in the disk according to the row, and then gradually converted into column storage as more data is written. bluk represents batch loading mode, and the data will be converted into column storage and then dropped into disk.
    * When the amount of data inserted in a batch exceeds rowstore_size, it will be converted into multiple columns in multiple times;
    * level_size_amplifier The enlargement coefficient used to specify the Level size
    */
   compress_threshold={1-100000},mars3options='rowstore_size={8-1024},prefer_load_mode={NORMAL|BLUK},level_size_amplifier={1-1000}'
   // The following are the parameters related to the MARS3 table downgrade storage function (all supported starting from v5.2.0)
   /* ttl_interval Used to specify the threshold for data transition from hot to cold. The unit must be specified manually, such as “2 H,” which means that data from two hours ago is defined as cold data and will be automatically downgraded;
    * ttl_interval currently supports d (days)/H (hours) units;
    * ttl_space referers to the corresponding tablespace, and cannot be created if it does not exist;
    * ttl_interval and ttl_space both need to be declared when creating tables or set with ALTER statements on existing tables.
    */
   ttl_interval={1-INT_MAX},ttl_space={}

注意!
For detailed information on the parameters of the MARS2 (v4.5.0 and later) and MARS3 (v5.1.0 and later) storage engines, please click here.

描述

CREATE TABLE creates an initially empty table in the current database. The user who executes the command owns the table.

To be able to create a table, you must have USAGE privilege on all column types or the types in the OF clause.

If a schema name is specified, YMatrix will create the table in the specified schema. Otherwise, YMatrix will create the table in the current schema. Temporary tables exist in a special schema, so you cannot specify a schema name when creating a temporary table. The table name must be different from the names of any other tables, external tables, sequences, indexes, views, or external tables in the same schema.

CREATE TABLE also automatically creates a data type that represents the composite type corresponding to a row in the table. Therefore, the table cannot have the same name as any existing data type in the same schema.

Optional constraint clauses specify the conditions that new or updated rows must satisfy for an insert or update operation to succeed. A constraint is an SQL object that can help define the valid value set in a table in various ways. Constraints apply to tables, not partitions. You cannot add constraints to partitions or subpartitions.

Referential integrity constraints (foreign keys) are accepted but not enforced. This information is retained in the system catalog and otherwise ignored.

There are two ways to define constraints: table constraints and column constraints. Column constraints are defined as part of the column definition. Table constraints are not bound to specific columns and can include multiple columns. Each column constraint can also be written as a table constraint. When a constraint affects only one column, using a column constraint is merely a matter of convenience.

When creating a table, there is an additional clause to declare the YMatrix database distribution strategy. If the DISTRIBUTED BY, DISTRIBUTED RANDOMLY, or DISTRIBUTED REPLICATED clauses are not provided, the YMatrix database will assign a hash distribution strategy to the table using the PRIMARY KEY (if the table has one) or the first column of the table as the distribution key. Columns of geometric or user-defined data types do not meet the requirements for YMatrix distribution key columns. If there are no columns of eligible data types in the table, rows will be distributed based on round-robin or random distribution. To ensure uniform distribution of data across the YMatrix database system, you should select a distribution key that is unique for each record, or if that is not possible, select DISTRIBUTED RANDOMLY.

If the DISTRIBUTED REPLICATED clause is provided, the YMatrix database will distribute all rows of the table to all Segments in the YMatrix database system. This option can be used if user-defined functions must be executed on Segments and these functions require access to all rows of the table. Replicated functions can also be used to prevent Broadcast Motions of the table, thereby improving query performance. The DISTRIBUTED REPLICATED clause cannot be used with the PARTITION BY clause or the INHERITS clause. A replicated table cannot be inherited by another table. Hidden system columns (ctid, cmin, cmax, xmin, xmax, and gp_segment_id) cannot be referenced in user queries on replicated tables because they do not have a single, unambiguous value.

Using the PARTITION BY clause, you can divide a table into multiple sub-tables (or partitions), which together form the parent table and share its schema. Although sub-tables exist as independent tables, the YMatrix database imposes important restrictions on their use. Internally, partitions are implemented as a special form of inheritance. Each subtable partition is created based on different CHECK constraints, which restrict the data that the table can contain according to certain defined conditions. The query optimizer also uses CHECK constraints to determine which table partitions to scan to satisfy the given query predicate. These partition constraints are automatically managed by the YMatrix database.

YMatrix v5.0.0 now supports specifying a custom encoding chain algorithm for compression when creating tables. For details, see Using Compression.

参数

GLOBAL | LOCAL

  • These keywords are provided for SQL standard compatibility, but they are not valid in the YMatrix database and have been deprecated.

TEMPORARY | TEMP

  • If specified, the table will be created as a temporary table. Temporary tables are automatically deleted when the session ends or when the current transaction ends (see ON COMMIT). While a temporary table exists, existing permanent tables with the same name are invisible in the current session unless they are referenced using a schema-qualified name. All indexes created on a temporary table are also automatically temporary indexes.

UNLOGGED

  • If specified, the table will be created as an unlogged table. Data written to unlogged tables is not written to the write-ahead log (WAL), making them much faster than regular tables. However, the contents of unlogged tables are not replicated to Mirror instances. Similarly, unlogged tables are not crash-safe. After a Segment instance crashes or shuts down abnormally, the data in unlogged tables on that Segment will be truncated. All indexes created on unlogged tables will also automatically become unlogged indexes.

table_name

  • The name of the new table to be created (can be specified by the schema).

OF type_name

  • Create a typed table that derives its structure from a specified composite type (the name of which can be specified by the schema). A typed table is associated with its type. For example, if the type is dropped (using DROP TYPE ... CASCADE), the table will also be dropped. When a typed table is created, the data types of the columns are determined by the underlying composite type and are not directly specified in the CREATE TABLE command. However, the CREATE TABLE command can add default values and constraints to the table and specify storage parameters.

column_name

  • The name of the column to be created in the new table.

data_type

  • The data type of the column. This may include array specifiers.
    For table columns containing text data, specify the data type VARCHAR or TEXT. It is not recommended to specify the data type CHAR. In the YMatrix database, the data types VARCHAR or TEXT handle padding as valid characters added to the data (space characters added after the last non-space character), while the data type CHAR does not handle padding.

COLLATE collation

  • The COLLATE clause assigns a collation rule to the column (the collation rule must be a sortable data type). If not specified, the default collation rule for the column data type is used.

DEFAULT default_expr

  • The DEFAULT clause assigns default data values to columns appearing in their column definitions. The value is any expression without variables (subqueries and cross-references to other columns in the current table are not allowed). The data type of the default expression must match the data type of the column. The default expression will be used in any insert operation where no value is specified for the column. If the column does not have a default value, the default value is N.ULL。

ENCODING ( storage_directive [, ...] )

  • For columns, the optional ENCODING clause specifies the compression type and block size for column data.
    This clause is only valid for column-oriented tables with additional optimization.
    Column compression settings are inherited from the table level to the partition level and then to the subpartition level. The lowest level setting takes precedence.

INHERITS ( parent_table [, …])

  • The optional INHERITS clause specifies a list of tables from which all columns will be automatically inherited. Using INHERITS creates a persistent relationship between the new child table and its parent table. Pattern modifications to the parent are usually propagated to the child, and by default, the data of the child table is included in the parent scan.
    In the YMatrix database, the INHERITS clause is not used when creating partition tables. Although the concept of inheritance is used in the partition hierarchy, the inheritance structure of the partition table is created using the PARTITION BY clause.
    If more than one parent table has the same column name, an error will be reported unless the data type of the columns in each parent table matches. If there is no conflict, the duplicate columns are merged to form a column in the new table. If the column name list of the new table contains a column name that is also inherited, the data type must also match the inherited column and the column definitions will be merged into one. If the new table explicitly specifies the default value for the column, the default value overrides all default values ​​in the inheritance declaration of the column. Otherwise, all parents that specify the default value for this column must specify the same default value, otherwise an error will be reported.
    CHECK constraints are merged essentially in the same way as columns: If multiple parent tables or new table definitions contain constraints with the same name, these constraints must all have the same check expression, otherwise an error will be reported. Constraints with the same name and expression are merged into a copy. Constraints marked NO INHERIT in the parent are not considered. Note that unnamed CHECK constraints in the new table will never be merged because a unique name will always be selected for it. The column STORAGE settings are also copied from the parent table. LIKE source_table like_option ...]
  • The LIKE clause specifies a table from which the new table automatically copies all column names, its data type, non-empty constraints, and distribution policies. Storage properties such as append optimization or partition structure are not copied. Unlike INHERITS, new and original tables are completely decoupled after creation is complete.
    The default expression for the copied column definition is copied only when INCLUDING DEFAULTS is specified. The default behavior is to exclude default expressions, resulting in columns copied in new tables having empty default values. Non-null constraints are always copied to the new table. The CHECK constraint is replicated only when INCLUDING CONSTRAINTS is specified. There is no difference between column constraints and table constraints.
    Indexes, PRIMARY KEY and UNIQUE constraints are created on the new table only if the INCLUDING INDEXES clause is specified. Regardless of the original name, the name of the new index and constraint is selected based on the default rules. (This behavior avoids duplicate name errors that may occur in new indexes.)
    No indexes on the original table are created on the new table unless the INCLUDING INDEXES clause is specified.
    The STORAGE settings for the copied column definition are copied only if INCLUDING STORAGE is specified. The default behavior is to exclude the STORAGE setting, resulting in the columns copied in the new table having a type-specific default setting.
    Comment of copied columns, constraints, and indexes is copied only when INCLUDING COMMENTS is specified. The default behavior is to exclude comments, resulting in no comments for copied columns and constraints in new tables.
    INCLUDING ALL is the abbreviation of INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING COMMENTS.
    Note that unlike INHERITS, columns and constraints copied by LIKE are not merged with columns and constraints with similar names. An error is indicated if the same name is explicitly specified, or if the same name is specified in another LIKE clause.
    The LIKE clause can also be used to copy columns from views, external tables, or composite types. Options that are not applicable (for example, from view INCLUDING INDEXES) will be ignored.

CONSTRAINT constraint_name

  • Optional name for a column or table constraint. If a constraint is violent, the constraint name will appear in the error message, so you can use the constraint name (for example, the column must be positive) to communicate useful constraint information to the client application. (Double quotes are required to specify the constraint name containing spaces.) If no constraint name is specified, the system generates a name.
    Note: The specified constraint_name is used for constraints, but the system-generated unique name is used for index names. In some previous versions, the provided names were used for both constraint names and index names.

NULL | NOT NULL

  • Specifies whether to allow the column to contain null values. The default value is NULL.CHECK (expression) [ NO INHERIT ]
  • The CHECK clause specifies an expression that generates a Boolean result. New or updated rows must be satisfied for the insert or update operation to succeed. Expressions computed as TRUE or UNKNOWN will succeed. If any row of the insert or update operation produces a FALSE result, an error exception is thrown and insert or update does not change the database. A check constraint specified as a column constraint should refer to only the value of that column, while an expression that appears in a table constraint can refer to multiple columns.
    Constraints marked NO INHERIT do not propagate to the subtable.
    Currently, the CHECK expression cannot contain subqueries, nor can it refer to variables other than the columns of the current row.

UNIQUE ( column_constraint )

UNIQUE ( column_name [, ... ] ) ( table_constraint )

  • UNIQUE constraint A set of columns or columns in a specified table can only contain unique values. Unique table constraints behave the same as column constraints, but have additional functionality across multiple columns. Null values ​​​​​are not considered equal for the purpose of unique constraints. The unique column must contain all columns of the YMatrix distribution key. Additionally, if the table is partitioned, the key must contain all the columns in the partition key. Note that the key constraints in the partition table are different from the simple UNIQUE INDEX.

PRIMARY KEY ( column constraint )

PRIMARY KEY ( column_name [, ... ] ) ( table constraint )

  • PRIMARY KEY Constraints One or more columns of a specified table can only contain unique (non-duplicate) and non-NULL values. Only one primary key can be specified for a table, whether as a column or a table constraint.
    To make a table have a primary key, it must be hashed (not randomly distributed), and the primary key (unique column) must contain all columns of the YMatrix distribution key. Additionally, If the table is partitioned, the key must contain all the columns in the partition key. Note that the key constraints in the partition table are different from the simple UNIQUE INDEX.
    PRIMARY KEY enforces the same combined data constraints as UNIQUE and NOT NULL, but identifying a set of columns as primary keys can also provide metadata about schema design, because the primary key identifies other tables that can rely on This set of columns to collaborate as a row's unique identifier.

REFERENCES reftable [ ( refcolumn ) ]

[ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ]

[ON DELETE | ON UPDATE] [key_action]

FOREIGN KEY (column_name [, ...])

  • The REFERENCES and FOREIGN KEY clauses specify reference integrity constraints (foreign key constraints). YMatrix accepts reference integrity constraints specified in PostgreSQL syntax, but is not enforced. For information about reference integrity constraints, see the PostgreSQL documentation.

DEFERRABLE

NOT DEFERRABLE

  • [NOT] The DEFERRABLE clause controls whether the constraint can be deferred. A non-deferred constraint will be checked immediately after each command. You can defer checking constraints until the transaction ends (using the SET CONSTRAINTS command). The default value is NOT DEFERRABLE. Currently, only UNIQUE and PRIMARY KEY constraints are deferred. NOT NULL and CHECK constraints are Not delayed. The REFERENCES (foreign key) constraint accepts this clause but is not enforced.

INITIALLY IMMEDIATEINITIALLY DEFERRED

  • If the constraint is delayable, this clause specifies the default time to check the constraint. If the constraint is INITIALLY IMMEDIATE, it is checked after each statement. This is the default value. If the constraint is INITIALLY DEFERRED, checks are only performed at the end of the transaction. The constraint check time can be changed using the SET CONSTRAINTS command.WITH ( storage_parameter=value )

  • The WITH clause can specify storage parameters for tables and indexes associated with UNIQUE or PRIMARY constraints. Note that you can also set storage parameters on specific partitions or subpartitions by declaring the WITH clause in the partition specification. The lowest-level setting takes precedence.

  • The default values for certain table storage options can be specified using the server configuration parameter gp_default_storage_options.

  • The following storage options are available:

    • appendoptimized — Set to TRUE to create the table as an append-optimized table. If FALSE or not declared, the table is created as a regular heap-stored table.
    • blocksize — Sets the size of each block in the table (in bytes). blocksize must be between 8192 and 2097152 bytes and a multiple of 8192. The default value is 32768.
    • orientation — Set to column for columnar storage, or set to row (default) for row-based storage. This option is only effective when appendoptimized = TRUE. Heap-stored tables can only be row-oriented.
    • checksum — This option is only valid for append-optimized tables (appendoptimized = TRUE). The default value is TRUE, which enables CRC checksum verification for append-optimized tables. The checksum is calculated during block creation and stored on disk. Checksum verification is performed during block read operations. If the checksum calculated during read operations does not match the stored checksum, the transaction is aborted. If the value is set to FALSE to disable checksum verification, no checks are performed to verify whether the table data is damaged on disk.
    • compresstype — Set to ZLIB (default), ZSTD, RLE_TYPE, QUICKLZ1, or use encoding chain MXCUSTOM to specify the compression type to use. The value NONE disables compression. ZSTD offers speed or good compression ratio, which can be adjusted using the compresslevel option. QuickLZ and ZLIB are provided for backward compatibility. In typical workloads, ZSTD outperforms these compression types. The compresstype option is only effective when appendoptimized = TRUE.
      The value RLE_TYPE is only supported when orientation = column is specified, enabling run-length encoding (RLE) compression. When the same data values appear in many consecutive rows, RLE compression is more effective than ZSTD, ZLIB, or QuickLZ compression algorithms.
      For columns of type BIGINT, INTEGER, DATE, TIME, or TIMESTAMP, incremental compression is also applied if the compresstype option is set to RLE_TYPE compression. Incremental compression algorithms are based on the increments between column values in consecutive rows and are designed to improve compression when loading data in sorted order or applying compression to column data in sorted order. If an encoding chain is used, the specific encoding chain algorithm must be specified using the encodechain parameter after compresstype. For details, see Using Compression.

    The encoding chain compression specified in the WITH clause is table-level.

    • compresslevel — For ZSTD compression of append-optimized tables, set this to an integer value between 1 (fastest compression) and 19 (highest compression ratio). For ZLIB compression, the valid range is 1 to 9. The QuickLZ compression level can only be set to 1. If not specified, the default value is 1. For RLE_TYPE, the compression level can be an integer value between 1 (fastest compression) and 4 (highest compression ratio). The compresslevel option is only effective when appendoptimized = TRUE.
    • fillfactor — fillfactor is a percentage value between 10 and 100, with a default of 100. When a smaller value is specified, the remaining space in the page is reserved for updating rows. This makes updates more efficient, as the updated rows are placed in the same page rather than being spread across different pages. For tables that are never updated, the default value can be used. For tables with a large number of updates, the value should be appropriately reduced. This parameter cannot be set for TOAST tables.
    • oids=FALSE — This is the default value, and no OID is assigned to each data row. In large tables (such as those in a typical YMatrix database system), using OIDs for table rows may cause the 32-bit OID counter to wrap around. Once the counter wraps around, OIDs can no longer be considered unique, rendering them useless to user applications and causing issues in the YMatrix database system catalog tables. Additionally, excluding OIDs from the table reduces the space required to store each row on disk by 4 bytes per row, thereby slightly improving performance. OIDs are not permitted on partitioned tables or column-oriented tables optimized for appends.
    • For detailed parameters related to MARS2 and MARS3, please refer to Storage Engine Principles.

ON COMMIT

  • You can use ON COMMIT to control the behavior of temporary tables at the end of a transaction block. The three options are:
    • PRESERVE ROWS - No special action is taken when the transaction on the temporary table ends. This is the default behavior.
    • DELETE ROWS - All rows in the temporary table will be deleted at the end of each transaction block. Essentially, a TRUNCATE is automatically executed each time a commit is performed.
    • DROP - The temporary table will be deleted at the end of the current transaction block.

TABLESPACE tablespace

  • The name of the tablespace in which to create the new table. If not specified, the database's default tablespace is used.

USING INDEX TABLESPACE tablespace

  • This clause allows you to select the tablespace in which to create indexes associated with UNIQUE or PRIMARY KEY constraints. If not specified, the database's default tablespace is used.

DISTRIBUTED BY (column [opclass], [ ... ] )

DISTRIBUTED RANDOMLY

DISTRIBUTED REPLICATED

  • The YMatrix database distribution strategy used for declaring tables. DISTRIBUTED BY uses a hash distribution with one or more columns declared as distribution keys. For the most uniform data distribution, the distribution key should be the table's primary key or unique column (or set of columns). If this is not possible, you can choose DISTRIBUTED RANDOMLY, which sends data in a round-robin fashion to Segment instances. Additionally, you can specify the operator class opclass to use a non-default hash function.
    If the DISTRIBUTED BY clause is not specified when creating a table, the YMatrix database server configuration parameter gp_create_table_random_default_distribution will control the default distribution strategy for the table. If no distribution strategy is specified, the YMatrix database will follow the following rules to create the table:
    • If the LIKE or INHERITS clause is specified, YMatrix copies the distribution key from the source table or parent table.
    • If a PRIMARY KEY or UNIQUE constraint is specified, YMatrix selects the largest subset of all key columns as the distribution key.
    • If neither constraints nor LIKE or INHERITS clauses are specified, YMatrix selects the first suitable column as the distribution key. (Columns with geometric or user-defined data types do not meet the conditions for being used as YMatrix distribution key columns.)

If the parameter value is set to on, the YMatrix database follows the following rules:

  • If no PRIMARY KEY or UNIQUE column is specified, the table distribution is random (DISTRIBUTED RANDOMLY). Even if the table creation command includes a LIKE or INHERITS clause, the table distribution is random.
  • If a PRIMARY KEY or UNIQUE column is specified, the DISTRIBUTED BY clause must also be specified. If the DISTRIBUTED BY clause is not specified in the table creation command, the command will fail.

The DISTRIBUTED REPLICATED clause replicates the entire table to all YMatrix database Segment instances. It can be used when a function needs to access all rows in the table or when it is necessary to improve query performance by blocking Broadcast Motion. It can be used when executing user-defined functions on Segment.

PARTITION BY

  • Declare one or more columns used to partition a table.
    When creating a partition table, the YMatrix database creates a root partition table (root partition) using the specified table name. The YMatrix database also creates tables based on the partition options you specify, the hierarchy of subtables, which are subpartitions. YMatrix database pg_partition* The system view contains information about the subpartition table.
    For each partition level (each hierarchical level of the table), a partitioned table can have up to 32,767 partitions.
    Note: The YMatrix database stores partition table data in a leaf table, which is the lowest-level table in the sub-table hierarchy for use by partition tables. - partition_type
    • Declare partition type: LIST (value list) or RANGE (number or date range).
    • partition_specification
      • Declare the individual partitions to be created. Each partition can be defined individually, or for range partitions, the EVERY clause (with START and optional END clause) can be used to define the incremental pattern used to create a single partition.
        DEFAULT PARTITION name — Declare the default partition. When the data does not match the existing partition, it is inserted into the default partition. Partition designs without default partitions will reject incoming rows that do not match the existing partition.
        PARTITION name — Declare the name to be used for the partition. Create a partition using the following naming convention: parentname_level#_prt_givenname.
        VALUES — For list partitions, define the value that the partition will contain. START — For range partitions, define the starting range value of the partition. By default, the starting value is INCLUSIVE. For example, If you declare the start date '2016-01-01', the partition will contain all dates greater than or equal to '2016-01-01'. Typically, the data type of the START expression is the same as the type of the partition key column. If this is not the case, it must be explicitly converted to the expected data type.
        END — For range partitions, define the end range value of the partition. By default, the end value is EXCLUSIVE. For example, if you declare an end date of '2016-02-01', the partition will contain all dates smaller than but not equal to '2016-02-01'. Typically, the data type of the END expression is the same as the type of the partition key column. If this is not the case, it must be explicitly converted to the expected data type.
        EVERY — For range partitions, define how to increment the value from START to END to create a single partition. Typically, the data type of the EVERY expression is the same as the type of the partition key column. If this is not the case, it must be explicitly converted to the expected data type.
        WITH — Sets the table storage options for partitions. For example, you might want to use an older partition as an append optimization table and a newer partition as a regular heap table.
        TABLESPACE —The name of the tablespace in which the partition is to be created.

SUBPARTITION BY

  • Declare one or more columns used to subpartition the first-level partition of a table. The canonical format of the subpartition is similar to the canonical format of the above partition.

SUBPARTITION TEMPLATE

  • Instead of declaring each subpartition definition separately for each partition, you can choose to declare a subpartition template (low-level subtable) to create a subpartition. This child partition specification will then be applied to all parent partitions.## Note
  • In the YMatrix database (Postgres-based system), the data type VARCHAR or TEXT handles the data that is popular as a valid character to the text (the space character is added after the last non-space character); the data type CHAR does not.
  • In the YMatrix database, a value of type CHAR(n) is filled with trailing spaces to the specified width n. The value will be stored and displayed as a space. However, filling in spaces is not semantically important. When assigning values, trailing spaces are ignored. When comparing two values ​​​​of data type CHAR, trailing spaces are semantically irrelevant, and trailing spaces are also deleted when converting character values ​​to one of the other string types.
  • OID is not recommended in new applications: When possible, it is best to use SERIAL or other sequence generators as the primary key of the table. However, if your application does use an OID to identify a specific row of a table, it is recommended to create a unique constraint on the OID column of that table to ensure that the OID in the table does uniquely identify rows even after the counter is backed. Avoid assuming that OIDs are unique between tables; if a database-wide unique identifier is required, you can use a combination of table OID and row OID.
  • The YMatrix database has some special conditions for the only constraints of the primary key and the columns that are distributed keys in the YMatrix table. To enforce unique constraints in the YMatrix database, the table must be hashed (not DISTRIBUTED RANDOMLY), and the constraint column must be the same as the table's distributed key column (or as its superset). In addition, the distribution key must be the left subset of the constraint columns and the columns are in the correct order. For example, if the primary key is (a,b,c), the distribution key can only be one of the following: (a), (a,b) or (a,b,c).
    DisTRIBUTED REPLICATED can have both PRIMARY KEY and UNIQUE column constraints.
    Primary key constraints are only a combination of unique constraints and non-null constraints.
    The YMatrix database automatically creates a UNIQUE index for each UNIQUE or PRIMARY KEY constraint to enforce uniqueness. Therefore, it is not necessary to explicitly create an index for the primary key column. The UNIQUE and PRIMARY KEY constraints are not allowed on appended optimization tables because UNIQUE indexes created by constraints are not allowed on appended optimization tables.
    Foreign key constraints are not supported in the YMatrix database.
    For inherited tables, unique constraints, primary key constraints, indexes and table privileges will not be inherited in the current implementation.
  • For appended tables, UPDATE and DELETE are not allowed in repeated read or serializable transactions, which will cause transaction abort. CLUSTER, DECLARE...FOR UPDATE and triggers do not support appended tables.
  • To insert data into a partition table, specify the root partition table, which is the table created using the CREATE TABLE command. You can also specify the leaf table of the partition table in the INSERT command. If the data is invalid for the specified leaf table, an error is returned. Specifying subtables that are not leaf tables in the INSERT command is not supported. Execution of other DML commands, such as UPDATE and DELETE, on any subtables of a partitioned table, is not supported. These commands must be executed on the root partition table (the table created using the CREATE TABLE command).

Example

Create a table named rank in a schema called baby and distributed the data using the rank, gender, and year columns:

CREATE TABLE baby.rank (id int, rank int, year smallint,
gender char(1), count int ) DISTRIBUTED BY (rank, gender,
year);

Create table files and table allocators (by default, the primary key will be used as the YMatrix distribution key):

CREATE TABLE films (
code        char(5) CONSTRAINT firstkey PRIMARY KEY,
title       varchar(40) NOT NULL,
did         integer NOT NULL,
date_prod   date,
kind        varchar(10),
len         interval hour to minute
);

CREATE TABLE distributors (
did    integer PRIMARY KEY DEFAULT nextval('serial'),
name   varchar(40) NOT NULL CHECK (name <> '')
);

Create a gzip compressed, appended table:

CREATE TABLE sales (txn_id int, qty int, date date) 
WITH (appendoptimized=true, compresslevel=5) 
DISTRIBUTED BY (txn_id);

Create a simple, single-level partition table:

CREATE TABLE sales (id int, year int, qtr int, c_rank int, code char(1), region text)
DISTRIBUTED BY (id)
PARTITION BY LIST (code)
( PARTITION sales VALUES ('S'),
  PARTITION returns VALUES ('R')
);

Create a three-level partition table without using the SUBPARTITION TEMPLATE clause:

CREATE TABLE sales (id int, year int, qtr int, c_rank int, code char(1), region text)
DISTRIBUTED BY (id)
PARTITION BY LIST (code)
  SUBPARTITION BY RANGE (c_rank)
    SUBPARTITION by LIST (region)

( PARTITION sales VALUES ('S')
   ( SUBPARTITION cr1 START (1) END (2)
      ( SUBPARTITION ca VALUES ('CA') ), 
      SUBPARTITION cr2 START (3) END (4)
        ( SUBPARTITION ca VALUES ('CA') ) ),

 PARTITION returns VALUES ('R')
   ( SUBPARTITION cr1 START (1) END (2)
      ( SUBPARTITION ca VALUES ('CA') ), 
     SUBPARTITION cr2 START (3) END (4)
        ( SUBPARTITION ca VALUES ('CA') ) )
);

Use the SUBPARTITION TEMPLATE clause to create the same partition table as the previous example:

CREATE TABLE sales1 (id int, year int, qtr int, c_rank int, code char(1), region text)
DISTRIBUTED BY (id)
PARTITION BY LIST (code)

   SUBPARTITION BY RANGE (c_rank)
     SUBPARTITION TEMPLATE (
     SUBPARTITION cr1 START (1) END (2),
     SUBPARTITION cr2 START (3) END (4) )

     SUBPARTITION BY LIST (region)
       SUBPARTITION TEMPLATE (
       SUBPARTITION ca VALUES ('CA') )

( PARTITION sales VALUES ('S'),
  PARTITION  returns VALUES ('R')
) ;

Create a three-level partition table and use the subpartition template and default partition at each level:

CREATE TABLE sales (id int, year int, qtr int, c_rank int, code char(1), region text)
DISTRIBUTED BY (id)
PARTITION BY RANGE (year)

  SUBPARTITION BY RANGE (qtr)
    SUBPARTITION TEMPLATE (
    START (1) END (5) EVERY (1), 
    DEFAULT SUBPARTITION bad_qtr )

    SUBPARTITION BY LIST (region)
      SUBPARTITION TEMPLATE (
      SUBPARTITION usa VALUES ('usa'),
      SUBPARTITION europe VALUES ('europe'),
      SUBPARTITION asia VALUES ('asia'),
      DEFAULT SUBPARTITION other_regions)

( START (2009) END (2011) EVERY (1),
  DEFAULT PARTITION outlying_years);

Specify the SEGMENT_SET object for the new partition:

## Create SEGMENT_SET object ss1
CREATE SEGMENT_SET ss1 SEGMENTS('0,2');

## Create a t table
CREATE TABLE t(a int, b int) DISTRIBUTED BY(a) PARTITION BY RANGE(b) (DEFAULT PARTITION others SEGMENT_SET ss1);

## Specify the SEGMENT_SET object for the new partition
CREATE TABLE t_part_manual PARTITION OF t FOR VALUES FROM (3) TO (6) SEGMENT_SET ss1;

Create a MARS2 table. MARS2 tables depend on the matrixts extension. Before building a table, you first need to create an extension in the database using the storage engine.

CREATE EXTENSION matrixts;

Create a table that can be adaptively encoded and compressed. For details, see Using Compression.

CREATE TABLE t (
      f1 int8
    , f2 int8
) 
USING MARS2
WITH(
      compresstype=mxcustom
);

Creates a table that specifies both the table level and the column level compression. The column-level compression specification takes precedence over the table level (the column specifies ENCODING(compresstype=none)/ENCODING(minmax) exception). For details, see Using Compression.

CREATE TABLE t (
      f1 int8 ENCODING(compresstype=lz4)
    , f2 int8
) 
USING MARS2
WITH(
      compresstype=mxcustom
    , encodechain=auto
);
CREATE INDEX idx_mars2 ON t USING mars2_btree(f1);

Notes!
matrixts is extended to the database level, and it can be created once in a database without repeated creation.

CREATE TABLE disk_mars2(
time timestamp with time zone,
tag_id int,
read float,
write float
)
USING MARS2
WITH (compresstype=zstd, compresslevel=3)
DISTRIBUTED BY (tag_id);
CREATE INDEX ON disk_mars2 USING mars2_btree(time,tag_id);

Create a MARS3 table. The MARS3 table also relies on the matrixts extension. Before building a table, you first need to create an extension in the database using the storage engine.

CREATE EXTENSION matrixts;
CREATE TABLE t(
time timestamp with time zone,
tag_id int,
i4 int4,
i8 int8
)
USING MARS3
WITH (compresstype=zstd, compresslevel=3,compress_threshold=1200,
mars3options='rowstore_size=64,prefer_load_mode=normal,level_size_amplifier=8')
DISTRIBUTED BY (tag_id)
ORDER BY (time, tag_id);

compatibility

The CREATE TABLE command complies with SQL standards, except for the following:

  • Temporary Tables — In the SQL standard, temporary tables are defined only once and automatically exist in each session that requires them (starting with empty content). Instead, the YMatrix database requires each session to issue its own CREATE TEMPORARY TABLE command for each temporary table to be used. This allows different sessions to use the same temporary table name for different purposes, while the standard approach constraints all instances of a given temporary table name to have the same table structure.
    The standard difference between global and local temporary tables is not in the YMatrix database. The YMatrix database will accept the GLOBAL and LOCAL keywords in the temporary table declaration, but they are invalid and deprecated.
    If the ON COMMIT clause is omitted, the SQL standard specifies the default behavior as ON COMMIT DELETE ROWS. However, the default behavior in the YMatrix database is ON COMMIT PRESERVE ROWS. The ON COMMIT DROP option does not exist in the SQL standard.
  • Column Check Constraints — The SQL standard says that CHECK column constraints can only reference the columns to which they apply. Only CHECK table constraints can reference multiple columns. The YMatrix database does not enforce this restriction; it treats column and table check constraints the same.
  • NULL Constraint — NULL constraints are a YMatrix database extension to the SQL standard, for compatibility with certain other database systems (and symmetric NOT NULL constraints). Since it is the default value for any column, it does not need its existence.
  • Inheritance — Multiple inheritance through the INHERITS clause is an extension of the YMatrix database language. SQL: 1999 and later defines a single inheritance using different syntax and semantics. The YMatrix database does not yet support SQL:1999 style inheritance.
  • Partitioning — Table partitioning through the PARTITION BY clause is an extension of the YMatrix database language.
  • Zero-column tables — The YMatrix database allows the creation of tables that do not contain any columns (for example, CREATE TABLE foo();). This is an extension of the SQL standard and does not allow the use of zero lists. The zero list itself is of little use, but the use of zero lists does not allow strange special cases when ALTER TABLE DROP COLUMN, so YMatrix decided to ignore this specification limit.
  • LIKE — Although there are LIKE clauses in the SQL standard, many of the options accepted by the YMatrix database are not in the standard, and some of the options of the YMatrix database do not implement the standard.
  • WITH clause — The WITH clause is a YMatrix database extension. Storage parameters and OID are not in the standard.
  • Tablespaces — The YMatrix database concept for tablespaces is not part of the SQL standard. Clauses TABLESPACE and USING INDEX TABLESPACE are extensions.
  • Data Distribution — The concept of YMatrix databases for parallel or distributed databases is not part of the SQL standard. The DISTRIBUTED clause is an extension.

See also

ALTER TABLE, DROP TABLE, CREATE SEGMENT SET