CREATE INDEX

Define a new index.

Synopsis

CREATE [UNIQUE] INDEX [name] ON table_name [USING method]
       ( {column_name | (expression)} [COLLATE parameter] [opclass] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
       [ WITH ( storage_parameter = value [, ... ] ) ]
       [ TABLESPACE tablespace ]
       [ WHERE predicate ]

Description

CREATE INDEX creates an index on the specified columns of the specified table. Indexes are primarily used to enhance database performance (although improper use can result in performance degradation).

The key fields of an index are specified as column names or as expressions enclosed in parentheses. If the index method supports multi-column indexes, multiple fields can be specified.

Index fields can be expressions calculated based on the values of one or more columns in the table rows. This feature can be used to quickly access data based on some transformation of the underlying data. For example, an index calculated on upper(col) will allow the WHERE clause WHERE upper(col) = ‘JIM’ to use the index.

MatrixDB databases provide the following index methods: B-tree, bitmap, GiST, SP-GiST, and GIN. Users can also define their own index methods, but this is quite complex.

Partial indexes are created when a WHERE clause is present. A partial index is an index that contains only a portion of the table's entries, typically the portion that is more useful than the rest of the table. For example, if you have a table that contains both invoiced orders and uninvoiced orders, where the uninvoiced orders make up only a small portion of the total table but are the most frequently selected, you can improve performance by creating an index only on that portion.

The expression used in the WHERE clause may only reference columns from the base table, but it can use all columns, not just those that are indexed. Subqueries and aggregate expressions are also prohibited in the WHERE clause. The same restrictions apply to indexed fields used as expressions.

All functions and operators used in index definitions must be immutable. Their results must depend solely on their parameters and not on any external influences (such as the contents of another table or parameter values). This restriction ensures that the behavior of the index is well-defined. To use user-defined functions in index expressions or the WHERE clause, remember to mark the function as IMMUTABLE when creating it.

Parameters

UNIQUE

Create an index and check the table for duplicate values each time data is added. Duplicate entries will generate errors. Unique indexes are only applicable to B-tree indexes. In MatrixDB databases, unique indexes are only allowed when the index key columns are the same as (or a superset of) the MatrixDB distribution keys. On partitioned tables, unique indexes are only supported within a single partition—they cannot span all partitions.

nname

The name of the index to be created. Indexes are always created in the same schema as their parent tables. If this name is omitted, the YMatrix database will select an appropriate name based on the name of the parent table and the name of the index column.

table_name

The name of the table to be indexed (can be specified by schema).

method

The name of the indexing method to use. Choose from btree, bitmap, gist, spgist, and gin. The default method is btree. Currently, only B-tree, GiST, and GIN indexing methods support multi-column indexes. By default, up to 32 fields can be specified. Currently, only B-tree supports unique indexes.

column_name

The name of the column in the table on which the index is created. Only B-tree, bitmap, GiST, and GIN index methods support multi-column indexes.

expression

An expression based on one or more columns of a table. Expressions must usually be enclosed in parentheses, as shown in the syntax. However, if the expression is in the form of a function call, the parentheses can be omitted.

collation

The classification name used by the index. By default, the index uses the classification declared for the column to be indexed or the classification resulting from the expression to be indexed. Indexes with non-default classifications are useful for queries that involve expressions using non-default classifications.

opclass

The name of the operator class. The operator class identifies the operators to be used for the index on that column. For example, a B-tree index on a four-byte integer would use the int4_ops class (this operator class contains comparison functions for four-byte integers). In practice, the default operator class for the column data type is usually sufficient. The point of having operator classes is that for some data types, there may be more than one meaningful ordering. For example, complex numbers can be sorted by absolute value or by real part. For this, we can define two operator classes for the data type and then choose the appropriate class when creating the index.

ASC

Specify ascending order (default).

DESC

Specify descending order.

NULLS FIRST

Specify null before non-null. This is the default setting when specifying DESC.

NULLS LAST

Sort null values after non-null values. If DESC is not specified, this is the default setting.

storage_parameter

The name of the storage parameter specific to the index method. Each index method has its own set of allowed storage parameters. FILLFACTOR—B-tree, bitmap, GiST, and SP-GiST index methods all accept this parameter. The index's FILLFACTOR is a percentage that determines the extent to which the index method will attempt to pack the index pages. For B-trees, leaf pages are filled to this percentage during initial index construction and when the index is extended to the right (adding new maximum key values). If pages are subsequently filled completely, they are split, leading to a gradual decrease in index efficiency. B-trees use a default fill factor of 90, but any integer value between 10 and 100 can be chosen. If the table is static, a fill factor of 100 is best for minimizing the physical size of the index. However, for tables with a large number of updates, a smaller fill factor is better for minimizing the need for page splits. Other indexing methods use the fill factor in different but broadly similar ways; the default fill factor varies by method.

BUFFERING - In addition to FILLFACTOR, GiST indexes also accept the BUFFERING parameter. BUFFERING determines whether the YMatrix database uses the buffer-building technique described in the GiST buffer-building section of the PostgreSQL documentation to build the index. If set to OFF, it is disabled; if set to ON, it is enabled; When set to AUTO, it is initially disabled; however, once the index size reaches the effective cache size, it is immediately enabled. The default is AUTO.

FASTUPDATE - The GIN indexing method accepts the FASTUPDATE storage parameter. FASTUPDATE is a Boolean parameter used to disable or enable the GIN index fast update technique. The value ON enables fast updates (default), while the value OFF disables them. For more information, see the GIN fast update technique in the PostgreSQL documentation.

tablespace_name

The tablespace in which to create the index. If not specified, the default tablespace is used.

predicate

Constraint expressions for partial indexes.

Annotation

You can specify an operator class for each column in an index. An operator class identifies the operators to be used for the index on that column. For example, a B-tree index on a four-byte integer will use the int4_ops class. This operator class contains comparison functions for four-byte integers. In practice, the default operator class for a column's data type is usually sufficient. The key point of having operator classes is that for some data types, there may be more than one meaningful sorting order. For example, we may want to sort complex data types by absolute value or by real number. To do this, we can define two operator classes for the data type and then select the appropriate class when creating the index.

For index methods that support sorted scans (currently only B-trees), you can specify optional clauses such as ASC, DESC, NULLS FIRST, and/or NULLS LAST to modify the sorting order of the index. Since ordered indexes can be scanned forward or backward, creating a single-column DESC index is typically unnecessary—regular indexes already support sorting order. The value of these options lies in the ability to create multi-column indexes that match the sort order requested by mixed sort queries, such as SELECT ... ORDER BY x ASC, y DESC. If you need to support the “nulls sort low” behavior in queries that rely on indexes to avoid sorting steps, rather than the default “nulls sort high” behavior, the NULLS option is useful.

For most indexing methods, the speed of index creation depends on the maintenance_work_mem setting. Larger values will reduce the time required to create indexes, as long as you do not exceed the amount of memory actually available, which would cause the computer to swap.

After creating an index on a partitioned table, the index will propagate to all child tables created by the Matrix database. Indexes cannot be created on tables created by the MatrixDB database for use with partitioned tables.

UNIQUE indexes are only allowed when the indexed columns are the same as (or a superset of) the MatrixDB distribution key columns.

UNIQUE indexes are not allowed on append-optimized tables.

UNIQUE indexes can be created on partitioned tables. However, uniqueness is only enforced within each partition. Uniqueness is not enforced across partitions. For example, for a partitioned table with partitions based on years and subpartitions based on quarters, uniqueness is enforced only within each individual quarter partition. Uniqueness is not enforced across quarter partitions.

By default, indexes are not used with IS NULL clauses. In such cases, the best way to use indexes is to create partial indexes using IS NULL predicates.

Bitmap indexes perform best for columns with 100 to 100,000 distinct values. For columns with more than 100,000 distinct values, bitmap indexes experience a decline in performance and space efficiency. The size of a bitmap index is proportional to the number of rows in the table multiplied by the number of distinct values in the indexed column.

Columns with fewer than 100 distinct values typically do not benefit significantly from any type of index. For example, a gender column with only two distinct values (male and female) is not an ideal candidate for indexing.

Previous versions of the MatrixDB database also supported R-tree indexing methods. This method has been removed as it offers no clear advantages over the GiST method. If USING rtree is specified, CREATE INDEX will interpret it as USING gist.

The use of hash indexes has been disabled in the MatrixDB database.

Example

Create a B-tree index on the title column of the films table:

CREATE UNIQUE INDEX title_idx ON films (title);

To create a bitmap index on the gender column in the table employee:

CREATE INDEX gender_bmp_idx ON employee USING bitmap 
(gender);

To create an index on the expression lower(title) to allow valid case-insensitive searches:

CREATE INDEX ON films ((lower(title)));

(In this example, we choose to omit the index name, so the system will select a name, usually films_lower_idx.)

To create an index using non-default collation:

CREATE INDEX title_idx_german ON films (title COLLATE "de_DE");

To create an index with a non-default fill factor:

CREATE UNIQUE INDEX title_idx ON films (title) WITH 
(fillfactor = 70);

To create a GIN index with fast updates disabled:

CREATE INDEX gin_idx ON documents_table USING gin (locations) WITH (fastupdate = off);

To create an index on the column code of the table films and have the index reside in the tablespace indexspace:

CREATE INDEX code_idx ON films(code) TABLESPACE indexspace;

To create a GiST index on the point property, so that we can effectively use the box operator for the results of the conversion function:

CREATE INDEX pointloc ON points USING gist (box(location,location));
SELECT * FROM points WHERE box(location,location) && '(0,0),(1,1)'::box;

compatibility

CREATE INDEX is an extension of the MatrixDB database language. There is no indexing provision in the SQL standard.

简体中文