Continuous gathering

This document describes the continuous aggregation feature of YMatrix.

1. Background introduction

In actual scenarios, aggregation queries are very common. Because aggregation queries are generally based on a basic data set of orders of magnitude and are summarized, grouped and aggregated, they are usually considered to be a relatively time-consuming query type and are mostly used in analytical scenarios. In addition, the execution frequency of aggregation queries is becoming more and more frequent in actual scenarios, and the response time requirements for aggregation queries are also becoming higher and higher.

Continuous aggregation is a mechanism that can quickly respond to aggregation query. After creating a continuous aggregation, the system automatically performs aggregation calculation on the original data in real time during the data writing process. Because it is synchronized with the transaction level of the original table data, it is recommended that users query the continuous aggregation view as directly as possible.

2. How to use

2.1 Create an extension

Continuously aggregate dependency matrixts extension, first create the extension:

=# CREATE EXTENSION matrixts;

2.2 Create a data table

Data table is the original data table:

=# CREATE TABLE metrics(
   time timestamp,
   tag_id int,
   sensor float4
   )
   USING MARS3
   DISTRIBUTED BY (tag_id)
   ORDER BY (time,tag_id);

2.3 Create a continuous aggregate view

Continuous aggregation Use materialized views to store the aggregated results, and the WITH (CONTINUOUS) keyword and aggregation statement are required when creating the view:

=# CREATE VIEW cv1 WITH (CONTINUOUS) AS
    SELECT tag_id, COUNT(*), SUM(sensor)
    FROM metrics GROUP BY tag_id;

2.4 Insert data into the source data table

=# INSERT INTO metrics VALUES(NOW(), 1, 1.1);
=# INSERT INTO metrics VALUES(NOW(), 1, 1.2);
=# INSERT INTO metrics VALUES(NOW(), 2, 2.1);
=# INSERT INTO metrics VALUES(NOW(), 2, 2.2);

2.5 Query Continuously Aggregated Views

When using continuous aggregation, just insert time sequence data into the original data table, and the materialized view will automatically calculate the result:

=# SELECT * FROM cv1 ORDER BY tag_id;
 tag_id | count |    sum
-------+-------+---------------
      1 |     2 | 2.3000002
      2 |     2 |       4.3
(2 rows)

The results of the materialized view are the same as those of the aggregation calculation of the original table:

=# SELECT tag_id, COUNT(*), SUM(sensor)
    FROM metrics
    GROUP BY tag_id
    ORDER BY tag_id;
 tag_id | count |    sum
-------+-------+---------------
      1 |     2 | 2.3000002
      2 |     2 |       4.3
(2 rows)

If the source table already has data when creating a continuous aggregation, the data will also be synchronized by default. If synchronization is not required, the parameter POPULATE needs to be manually set to false when creating the view (default is true):

=# CREATE VIEW cv1 WITH (CONTINUOUS, POPULATE=false) AS
    SELECT tag_id, COUNT(*), SUM(sensor)
    FROM metrics GROUP BY tag_id;

3. Maintenance Methods

YMatrix provides UDFs (User Defined Functions) for maintaining continuous aggregation views:

  • matrixts_internal.analyze_continuous_view(rel REGCLASS)

Performs ANALYZE on the materialized view to update statistics, which helps the optimizer generate better execution plans.

SELECT matrixts_internal.analyze_continuous_view('cv1');
  • matrixts_internal.vacuum_continuous_view(rel REGCLASS, full BOOL)

VACUUMs the materialized view to reclaim space occupied by dead tuples and improve space reuse. If the second parameter is set to true, the table file is rebuilt to reduce storage footprint.

SELECT matrixts_internal.vacuum_continuous_view('cv1', true);
  • matrixts_internal.rebuild_continuous_view(rel REGCLASS)

Rebuilds the materialized view. This is useful when updates or deletions have been made to the source table and a full resynchronization is required.

SELECT matrixts_internal.rebuild_continuous_view('cv1');

4. FAQ

  1. Are the data of the materialized view synchronized in real time with the source table?
    • Yes, after the source table is inserted into the data, it can be queryed from the materialized view
  2. Can continuous gathering ensure ACID (Atomicity, Consistency, Isolation, Durability)?
    • Yes, the data in the materialized view uses the same transaction ID as the original table data
  3. How to deal with update deletion?
    • The update deletion of the original table and the DROP are not reflected in the continuous aggregation view, but support upsert updates and non-empty latest values ​​last_not_null aggregation
  4. Is continuous aggregation effective for window functions?
    • Window functions are not supported yet
  5. Does continuous aggregation support partition tables?
    • support
  6. Does continuous aggregation support multi-table aggregation?
    • Not supported
  7. Can a table define multiple continuous aggregations?
    • Can