CREATE INDEX

Define a new index.

Summary

CREATE [UNIQUE] INDEX [ CONCURRENTLY ] [name] ON table_name [USING method]
       ( {column_name | (expression)} [COLLATE parameter] [opclass] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
       [ WITH ( storage_parameter = value [, ... ] ) ]
       [ TABLESPACE tablespace ]
       [ WHERE predicate ]

Description

CREATE INDEX creates an index on the specified columns of the specified table. Indexes are primarily used to enhance database performance (although improper use can result in performance degradation).

The key fields of the index are specified as column names or as expressions enclosed in parentheses. If the index method supports multi-column indexes, multiple fields can be specified.

Index fields can be expressions calculated based on the values of one or more columns in the table rows. This feature can be used to quickly access data based on some transformation of the underlying data. For example, an index calculated on upper(col) will allow the WHERE clause WHERE upper(col) = ‘JIM’ to use the index.

The YMatrix database provides the following indexing methods: B-tree, bitmap, GiST, SP-GiST, and GIN. Users can also define their own indexing methods, but this is quite complex.

Partial indexes are created when a WHERE clause is present. A partial index is an index that contains only a portion of the table's entries, typically the portion that is more useful than the rest of the table. For example, if you have a table that contains both invoiced orders and uninvoiced orders, where the uninvoiced orders make up only a small portion of the total table but are the most frequently selected, you can improve performance by creating an index only on that portion.

The expression used in the WHERE clause may only reference columns from the base table, but it can use all columns, not just those that are indexed. Subqueries and aggregate expressions are also prohibited in the WHERE clause. The same restrictions apply to indexed fields used as expressions.

All functions and operators used in index definitions must be immutable. Their results must depend solely on their parameters and not on any external influences (such as the contents of another table or parameter values). This restriction ensures that the behavior of the index is well-defined. To use user-defined functions in index expressions or the WHERE clause, remember to mark the function as IMMUTABLE when creating it.

Parameters

UNIQUE

  • Create an index and check the table for duplicate values each time data is added. Duplicate entries will generate errors. Unique indexes are only applicable to B-tree indexes. In YMatrix databases, unique indexes are only allowed when the index key columns are the same as (or a superset of) the YMatrix distribution keys. On partitioned tables, unique indexes are only supported within a single partition—they cannot span all partitions.

CONCURRENTLY

  • No locks are enabled to prevent insert, update, or delete operations when creating an index. The creation process for a standard index locks write operations (but read operations are not affected) until creation is complete. However, when the CONCURRENTLY option is enabled, YMatrix does not need to enable locks for all write operations. For temporary tables, the CREATE INDEX statement is typically non-concurrent because no other sessions can access it. The non-concurrent index creation process also has lower overhead.

name

  • The name of the index to be created. Indexes are always created in the same schema as their parent tables. If this name is omitted, the YMatrix database will select an appropriate name based on the name of the parent table and the name of the index column.

table_name

  • The name of the table to be indexed (can be specified by schema).

method

  • The name of the indexing method to use. Choose from btree, bitmap, gist, spgist, and gin. The default method is btree. Currently, only B-tree, GiST, and GIN indexing methods support multi-column indexes. By default, up to 32 fields can be specified. Currently, only B-tree supports unique indexes.

column_name

  • The name of the column in the table on which the index is created. Only B-tree, bitmap, GiST, and GIN index methods support multi-column indexes.

expression

  • An expression based on one or more columns of a table. Expressions must usually be enclosed in parentheses, as shown in the syntax. However, if the expression is in the form of a function call, the parentheses can be omitted.

collation

  • The classification name used by the index. By default, the index uses the classification declared for the column to be indexed or the classification resulting from the expression to be indexed. Indexes with non-default classifications are useful for queries that involve expressions using non-default classifications.

opclass

  • The name of the operator class. The operator class identifies the operators to be used for the index on that column. For example, a B-tree index on a four-byte integer would use the int4_ops class (this operator class contains comparison functions for four-byte integers). In practice, the default operator class for the column data type is usually sufficient. The point of having operator classes is that for some data types, there may be more than one meaningful ordering. For example, complex numbers can be sorted by absolute value or by real part. For this, we can define two operator classes for the data type and then choose the appropriate class when creating the index.

ASC

  • Specify ascending order (default).

DESC

  • Specify descending order.

NULLS FIRST

  • Specify null before non-null. This is the default setting when specifying DESC.

NULLS LAST

  • Sort null values after non-null values. If DESC is not specified, this is the default setting.

storage_parameter

  • The name of the storage parameter specific to the index method. Each index method has its own set of allowed storage parameters. FILLFACTOR—B-tree, bitmap, GiST, and SP-GiST index methods all accept this parameter. The index's FILLFACTOR is a percentage that determines the extent to which the index method will attempt to pack the index pages. For B-trees, leaf pages are filled to this percentage during initial index construction and when the index is extended to the right (adding new maximum key values). If pages are subsequently filled completely, they are split, leading to a gradual decrease in index efficiency. B-trees use a default fill factor of 90, but any integer value between 10 and 100 can be chosen. If the table is static, a fill factor of 100 is best for minimizing the physical size of the index. However, for tables with a large number of updates, a smaller fill factor is better for minimizing the need for page splits. Other indexing methods use the fill factor in different but broadly similar ways; the default fill factor varies by method.

BUFFERING - In addition to FILLFACTOR, GiST indexes also accept the BUFFERING parameter. BUFFERING determines whether the YMatrix database uses the buffer-building technique described in the GiST buffer-building section of the PostgreSQL documentation to build the index. If set to OFF, it is disabled; if set to ON, it is enabled; When set to AUTO, it is initially disabled; however, once the index size reaches the effective cache size, it is immediately enabled. The default is AUTO.

FASTUPDATE - The GIN indexing method accepts the FASTUPDATE storage parameter. FASTUPDATE is a Boolean parameter used to disable or enable the GIN index fast update technique. The value ON enables fast updates (default), while the value OFF disables them. For more information, see the GIN fast update technique in the PostgreSQL documentation.

tablespace_name

  • The tablespace in which to create the index. If not specified, the default tablespace is used.

predicate

  • Constraint expressions for partial indexes.

Annotation

You can specify an operator class for each column in an index. An operator class identifies the operators to be used for the index on that column. For example, a B-tree index on a four-byte integer will use the int4_ops class. This operator class contains comparison functions for four-byte integers. In practice, the default operator class for a column's data type is usually sufficient. The key point of having operator classes is that for some data types, there may be more than one meaningful sorting order. For example, we may want to sort complex data types by absolute value or by real number. To do this, we can define two operator classes for the data type and then select the appropriate class when creating the index.

For index methods that support sorted scans (currently only B-trees), you can specify optional clauses such as ASC, DESC, NULLS FIRST, and/or NULLS LAST to modify the sorting order of the index. Since ordered indexes can be scanned forward or backward, creating a single-column DESC index is typically unnecessary—regular indexes already support sorting order. The value of these options lies in the ability to create multi-column indexes that match the sort order requested by mixed sort queries, such as SELECT ... ORDER BY x ASC, y DESC. If you need to support the “nulls sort low” behavior in queries that rely on indexes to avoid sorting steps, rather than the default “nulls sort high” behavior, the NULLS option is useful.

For most indexing methods, the speed of index creation depends on the maintenance_work_mem setting. Larger values will reduce the time required to create indexes, as long as you do not exceed the amount of memory actually available, which would cause the computer to swap.

After creating an index on a partitioned table, the index will propagate to all child tables created by the Matrix database. Indexes cannot be created on tables created by the YMatrix database for use with partitioned tables.

UNIQUE indexes are only allowed when the indexed columns are the same as (or a superset of) the YMatrix distribution key columns.

UNIQUE indexes are not allowed on append-optimized tables.

UNIQUE indexes can be created on partitioned tables. However, uniqueness is only enforced within each partition. Uniqueness is not enforced across partitions. For example, for a partitioned table with partitions based on years and subpartitions based on quarters, uniqueness is enforced only within each individual quarter partition. Uniqueness is not enforced across quarter partitions.

By default, indexes are not used with IS NULL clauses. In such cases, the best way to use indexes is to create partial indexes using IS NULL predicates.

Bitmap indexes perform best for columns with 100 to 100,000 distinct values. For columns with more than 100,000 distinct values, bitmap indexes experience a decline in performance and space efficiency. The size of a bitmap index is proportional to the number of rows in the table multiplied by the number of distinct values in the indexed column.

Columns with fewer than 100 distinct values typically do not benefit significantly from any type of index. For example, a gender column with only two distinct values (male and female) is not an ideal candidate for indexing.

Previous versions of the YMatrix database also supported R-tree indexing methods. This method has been removed as it offers no clear advantages over the GiST method. If USING rtree is specified, CREATE INDEX will interpret it as USING gist.

The use of hash indexes has been disabled in the YMatrix database.

Example

Create a B-tree index on the title column of the films table:

CREATE UNIQUE INDEX title_idx ON films (title);

To create a bitmap index on the gender column in the table employee:

CREATE INDEX gender_bmp_idx ON employee USING bitmap 
(gender);

To create an index on the expression lower(title) to allow valid case-insensitive searches:

CREATE INDEX ON films ((lower(title)));

(In this example, we choose to omit the index name, so the system will select a name, usually films_lower_idx.)

To create an index using non-default collation:

CREATE INDEX title_idx_german ON films (title COLLATE "de_DE");

To create an index with a non-default fill factor:

CREATE UNIQUE INDEX title_idx ON films (title) WITH 
(fillfactor = 70);

To create a GIN index with fast updates disabled:

CREATE INDEX gin_idx ON documents_table USING gin (locations) WITH (fastupdate = off);

To create an index on the column code of the table films and have the index reside in the tablespace indexspace:

CREATE INDEX code_idx ON films(code) TABLESPACE indexspace;

To create a GiST index on the point property, so that we can effectively use the box operator for the results of the conversion function:

CREATE INDEX pointloc ON points USING gist (box(location,location));
SELECT * FROM points WHERE box(location,location) && '(0,0),(1,1)'::box;

Create indexes for locks that write operations are not enabled:

CREATE INDEX CONCURRENTLY sales_quantity_index ON sales_table (quantity);

compatibility

CREATE INDEX is an extension of the YMatrix database language. There is no indexing provision in the SQL standard.

See also

ALTER INDEX, DROP INDEX