Continuous gathering

1. Background introduction

In actual scenarios, aggregation queries are very common. Because aggregation queries are generally based on a basic data set of orders of magnitude and are summarized, grouped and aggregated, they are usually considered to be a relatively time-consuming query type and are mostly used in analytical scenarios. In addition, the execution frequency of aggregation queries is becoming more and more frequent in actual scenarios, and the response time requirements for aggregation queries are also becoming higher and higher.

Continuous aggregation is a mechanism that can quickly respond to aggregation query. After creating a continuous aggregation, the system automatically performs aggregation calculation on the original data in real time during the data writing process. Because it is synchronized with the transaction level of the original table data, it is recommended that users query the continuous aggregation view as directly as possible.

2. How to use

2.1 Create an extension

Continuously aggregate dependency matrixts extension, first create the extension:

CREATE EXTENSION matrixts;

2.2 Create a data table

Data table is the original data table:

CREATE TABLE metrics(
    time timestamp,
    tag_id int,
    sensor float4
)
Distributed by(tag_id);

2.3 Create a continuous aggregate view

Continuous aggregation Use materialized views to store the aggregated results. You need to give the WITH (CONTINUOUS) keyword and aggregation statement when creating the view:

CREATE VIEW cv1 WITH (CONTINUOUS) AS
    SELECT tag_id, COUNT(*), SUM(sensor)
    FROM metrics GROUP BY tag_id;

2.4 Insert data into the source data table

INSERT INTO metrics VALUES(NOW(), 1, 1.1);
INSERT INTO metrics VALUES(NOW(), 1, 1.2);
INSERT INTO metrics VALUES(NOW(), 2, 2.1);
INSERT INTO metrics VALUES(NOW(), 2, 2.2);

2.5 Query Continuously Aggregated Views

When using continuous aggregation, just insert time sequence data into the original data table, and the materialized view will automatically calculate the result:

SELECT * FROM cv1 ORDER BY tag_id;
 tag_id | count |    sum
-------+-------+---------------
      1 |     2 | 2.3000002
      2 |     2 |       4.3
(2 rows)

The results of the materialized view are the same as those of the aggregation calculation of the original table:

SELECT tag_id, COUNT(*), SUM(sensor)
    FROM metrics
    GROUP BY tag_id
    ORDER BY tag_id;
 tag_id | count |    sum
-------+-------+---------------
      1 |     2 | 2.3000002
      2 |     2 |       4.3
(2 rows)

If the source table already has data when creating a continuous aggregation, the data will also be synchronized by default. If synchronization is not required, the parameter POPULATE needs to be manually set to false when creating the view (default is true):

CREATE VIEW cv1 WITH (CONTINUOUS, POPULATE=false) AS
    SELECT tag_id, COUNT(*), SUM(sensor)
    FROM metrics GROUP BY tag_id;

3. Maintenance method

MatrixDB provides UDFs for maintaining continuous aggregation:

matrixts_internal.analyze_continuous_view(rel REGCLASS)

ANALYZE materialized view can update statistics, which is conducive to the optimization for calculation: SELECT matrixts_internal.analyze_continuous_view('cv1');
matrixts_internal.vacuum_continuous_view(rel REGCLASS, full BOOL)

VACUUM materialized view can clean up the Dead tuple and make the space reuse; if the second parameter is true, the table file will be rebuilt and the storage space will be reduced: SELECT matrixts_internal.vacuum_continuous_view('cv1', true);
matrixts_internal.rebuild_continuous_view(rel REGCLASS)

Reconstruct the materialized view, suitable for scenarios where the source data table is updated and deleted, and the data is fully synchronized: SELECT matrixts_internal.rebuild_continuous_view('cv1');

4. FAQ

Are the data of the materialized view synchronized in real time with the source table?
- Yes, after the source table is inserted into the data, it can be queryed from the materialized view
Can continuous aggregation ensure ACID?
- Yes, the data in the materialized view uses the same transaction ID as the original table data
How to deal with update deletion?
- The update deletion of the original table and the DROP are not reflected in the continuous aggregation view, but support upsert update and non-empty latest value last_not_null aggregation
Is continuous aggregation effective for window functions?
- Window functions are not supported yet
Does continuous aggregation support partition tables?
- support
Does continuous aggregation support multi-table aggregation?
- Not supported
Can a table define multiple continuous aggregations?
- Can

简体中文