Continuous gathering

This document describes the continuous aggregation feature of YMatrix.

1. Background introduction

In actual scenarios, aggregation queries are very common. Because aggregation queries are generally based on a basic data set of orders of magnitude and are summarized, grouped and aggregated, they are usually considered to be a relatively time-consuming query type and are mostly used in analytical scenarios. In addition, the execution frequency of aggregation queries is becoming more and more frequent in actual scenarios, and the response time requirements for aggregation queries are also becoming higher and higher.

Continuous aggregation is a mechanism that can quickly respond to aggregation query. After creating a continuous aggregation, the system automatically performs aggregation calculation on the original data in real time during the data writing process. Because it is synchronized with the transaction level of the original table data, it is recommended that users query the continuous aggregation view as directly as possible.

2. How to use

2.1 Create an extension

Continuously aggregate dependency matrixts extension, first create the extension:

=# CREATE EXTENSION matrixts;

2.2 Create a data table

Data table is the original data table:

=# CREATE TABLE metrics(
   time timestamp,
   tag_id int,
   sensor float4
   )
   USING MARS3
   DISTRIBUTED BY (tag_id)
   ORDER BY (time,tag_id);

2.3 Create a continuous aggregate view

Continuous aggregation Use materialized views to store the aggregated results, and the WITH (CONTINUOUS) keyword and aggregation statement are required when creating the view:

=# CREATE VIEW cv1 WITH (CONTINUOUS) AS
    SELECT tag_id, COUNT(*), SUM(sensor)
    FROM metrics GROUP BY tag_id;

2.4 Insert data into the source data table

=# INSERT INTO metrics VALUES(NOW(), 1, 1.1);
=# INSERT INTO metrics VALUES(NOW(), 1, 1.2);
=# INSERT INTO metrics VALUES(NOW(), 2, 2.1);
=# INSERT INTO metrics VALUES(NOW(), 2, 2.2);

2.5 Query Continuously Aggregated Views

When using continuous aggregation, just insert time sequence data into the original data table, and the materialized view will automatically calculate the result:

=# SELECT * FROM cv1 ORDER BY tag_id;
 tag_id | count |    sum
-------+-------+---------------
      1 |     2 | 2.3000002
      2 |     2 |       4.3
(2 rows)

The results of the materialized view are the same as those of the aggregation calculation of the original table:

=# SELECT tag_id, COUNT(*), SUM(sensor)
    FROM metrics
    GROUP BY tag_id
    ORDER BY tag_id;
 tag_id | count |    sum
-------+-------+---------------
      1 |     2 | 2.3000002
      2 |     2 |       4.3
(2 rows)

If the source table already has data when creating a continuous aggregation, the data will not be synchronized by default. If synchronization is required, the parameter populate needs to be manually set to true when creating the view (default is false):

=# CREATE VIEW cv1 WITH (CONTINUOUS, POPULATE=false) AS
    SELECT tag_id, COUNT(*), SUM(sensor)
    FROM metrics GROUP BY tag_id;

3. Maintenance method

YMatrix provides UDFs (User Defined Functions) for maintaining continuous aggregation:

  • matrixts_internal.analyze_continuous_view(rel REGCLASS)

    ANALYZE materialized view can update statistical information, which is conducive to the optimizer's calculations: SELECT matrixts_internal.analyze_continuous_view('cv1');

  • matrixts_internal.vacuum_continuous_view(rel REGCLASS, full BOOL)

    VACUUM materialized view can clean up Dead Tuple and reuse the space; if the second parameter is true, the table file will be rebuilt and storage space will be reduced: SELECT matrixts_internal.vacuum_continuous_view('cv1', true);

  • matrixts_internal.rebuild_continuous_view(rel REGCLASS)

    Reconstruct the materialized view, suitable for scenarios where the source data table is updated and deleted, and the data is fully synchronized: SELECT matrixts_internal.rebuild_continuous_view('cv1');

4. FAQ

  1. Are the data of the materialized view synchronized in real time with the source table?
    • Yes, after the source table is inserted into the data, it can be queryed from the materialized view
  2. Can continuous gathering ensure ACID (Atomicity, Consistency, Isolation, Durability)?
    • Yes, the data in the materialized view uses the same transaction ID as the original table data
  3. How to deal with update deletion?
    • The update deletion of the original table and the DROP are not reflected in the continuous aggregation view, but support upsert updates and non-empty latest values ​​last_not_null aggregation
  4. Is continuous aggregation effective for window functions?
    • Window functions are not supported yet
  5. Does continuous aggregation support partition tables?
    • support
  6. Does continuous aggregation support multi-table aggregation?
    • Not supported
  7. Can a table define multiple continuous aggregations?
    • Can
  8. Will continuous aggregation specify compression parameters by default?
    • Won't