This document describes best practices for creating tables using YMatrix.
Specification | Required |
---|---|
Database names, table names, column names, function names, view names, sequence names, aliases, etc., must use a combination of lowercase letters, underscores, and numbers, but the first letter must be lowercase, and they must not start with pg . Object names must not exceed 63 characters in length |
Yes |
Field (or column) names must not use reserved field names or keywords, such as oid , xmin , ctid , select , insert , update , etc. |
Yes |
Primary key index names are pk_column_name , unique index names are uk_column_name , and regular index names are idx_column_name |
Yes |
Partitioned tables should use the parent table name as the prefix and the child table rule characteristics as the suffix, such as parent table order , child table order_p_202010 , order_p_202009 |
Yes |
When a field means “whether xx,” it is recommended to name it using the format is_xxx and select the Boolean type, such as is_deleted |
No |
It is best to keep the database name consistent with the application service name | No |
Table names should be clear and meaningful, avoiding ambiguous abbreviations or overly lengthy names. It is recommended to follow the format business name_table purpose , with views prefixed and temporary tables prefixed with tmp . Recommended: trade_config , payment_task . Avoid: tradeconfig , paymenttask . |
No |
Storage mode | Applicable to |
---|---|
Row storage | Data is frequently updated or inserted. Need to access multiple fields of a table at the same time |
column storage | There are many fields, but most queries access only a few fields. Perform aggregation operations on a small number of fields. Individual fields in the table are updated periodically without modifying other fields. For most OLAP business and time-series scenarios, it is recommended to prioritize column storage. |
YMatrix offers four storage engines: HEAP, AO and MARS3. We typically select the appropriate storage engine based on data scale and write performance requirements.
Storage Engine | Supported | Applicable to |
---|---|---|
HEAP | Row storage | Frequently updated tables Select row-stored tables |
AO | Row storage/Column storage | Tables and partitioned tables are not updated frequently after initial data loading, and data is only inserted in batches. |
MARS3 | Row-column hybrid storage/Column storage | Time Series Scenario OLTP scenarios involving high-frequency, small-batch data writes OLAP scenarios involving low-frequency, large-volume data writes |
Notes!
For more information about storage engines, see Storage Engine Overview.
Precise data types are very helpful for storage space, memory usage, and query efficiency. The following considerations should be taken into account when selecting types:
YMatrix supports distributing data across different nodes to leverage distributed parallel processing capabilities and improve performance. The distribution key primarily considers the following factors:
SELECT gp_segment_id, count(*) FROM table_name GROUP BY gp_segment_id;
.count(distinct field_name)
(i.e., the number of unique values in this field), the more suitable the field is as a distribution key.WHERE
clause of a query as distribution keys; instead, use them as partition keys.YMatrix is a distributed database that creates different partitions based on the amount of data. There are several key points to consider when setting up partitions:
Notes on partition maintenance operations for MARS2/MARS3 tables:
ALTER
operations on partitions requires locking the entire table, which will cause mutual blocking between queries.Notes on performing partition maintenance operations on HEAP/AOCO tables:
Note!
The above recommended partition size values are applicable to YMatrix 5.X.
You can configure the compression threshold by specifying compress_threshold at the MARS3 table level. Increasing this value appropriately can reduce the number of I/O operations, but it may also lower the block filtering rate.
Compressed storage is generally recommended, especially for column-store tables, where compressed storage is more effective. Compressed storage allows idle CPU resources to be used to replace read/write I/O time, alleviating I/O pressure and improving query efficiency.
If you use the MARS2 or MARS3 storage engine to create tables:
HEAP does not support explicit compression.
Indexes are a technique used to accelerate queries. Index data is typically stored separately, so this section will discuss them separately.
For HEAP tables and AO tables, there are various indexing methods such as Btree, Brin, Bitmap, and Hash. For text and spatial data, there are additional indexing methods such as Gin, Gist, Odin, Spgist, and Toin.
MARS2 currently supports primary key indexes and Brin indexes. The former is the sort key, which is the index specified by mars2_btree
, and is stored separately in the specified index file. The latter refers to the minmax
index created for a specific column, which is stored in a separate metadata file. Notes:
(vin)
in the time series scenario. If the requirement is a detailed query, aggregate query, or multi-device query for a single device within a certain time period, then the sort key is usually the device number and timestamp (vin,daq_time)
.MARS3 currently supports Brin indexes. You can optionally create one or more mars3_brin
indexes to improve query efficiency (similar to the minmax
index for MARS2 tables).
DISTRIBUTED REPLICATED
parameter.