English

Continuous Aggregation

This document describes the continuous aggregation feature in YMatrix.

1. Background

Aggregate queries are common in real-world scenarios. Since they typically operate on large datasets and perform grouping and summarization, they are generally considered relatively expensive in terms of execution time and are often used in analytical workloads. In practice, aggregate queries are being executed more frequently, with increasing demands for lower response times.

Continuous aggregation is a mechanism designed to accelerate the response to such queries. Once a continuous aggregation view is created, the system automatically performs real-time aggregation during data ingestion. Because the aggregation results are transactionally synchronized with the source table, users are advised to query the continuous aggregation view directly whenever possible.

2. Usage

2.1 Create Extension

Continuous aggregation depends on the matrixts extension. First, create the extension:

=# CREATE EXTENSION matrixts;

2.2 Create Data Table

The data table refers to the original source table:

=# CREATE TABLE metrics(
   time timestamp,
   tag_id int,
   sensor float4
   )
   USING MARS3
   DISTRIBUTED BY (tag_id)
   ORDER BY (time,tag_id);

2.3 Create Continuous Aggregation View

Continuous aggregation uses materialized views to store aggregated results. Use the WITH (CONTINUOUS) keyword when creating the view, along with the aggregation statement:

=# CREATE VIEW cv1 WITH (CONTINUOUS) AS
    SELECT tag_id, COUNT(*), SUM(sensor)
    FROM metrics GROUP BY tag_id;

2.4 Insert Data into Source Table

=# INSERT INTO metrics VALUES(NOW(), 1, 1.1);
=# INSERT INTO metrics VALUES(NOW(), 1, 1.2);
=# INSERT INTO metrics VALUES(NOW(), 2, 2.1);
=# INSERT INTO metrics VALUES(NOW(), 2, 2.2);

2.5 Query Continuous Aggregation View

With continuous aggregation enabled, the materialized view is automatically updated as data is inserted into the source table:

=# SELECT * FROM cv1 ORDER BY tag_id;
 tag_id | count |    sum
--------+-------+-----------
      1 |     2 | 2.3000002
      2 |     2 |       4.3
(2 rows)

The result from the materialized view matches that of aggregating the source table directly:

=# SELECT tag_id, COUNT(*), SUM(sensor)
    FROM metrics
    GROUP BY tag_id
    ORDER BY tag_id;
 tag_id | count |    sum
--------+-------+-----------
      1 |     2 | 2.3000002
      2 |     2 |       4.3
(2 rows)

If the source table already contains data when the continuous aggregation view is created, the existing data is not synchronized by default. To enable synchronization, set the populate parameter to true explicitly during view creation (default is false):

=# CREATE VIEW cv1 WITH (CONTINUOUS, POPULATE=false) AS
    SELECT tag_id, COUNT(*), SUM(sensor)
    FROM metrics GROUP BY tag_id;

3. Maintenance Methods

YMatrix provides UDFs (User Defined Functions) for maintaining continuous aggregation views:

matrixts_internal.analyze_continuous_view(rel REGCLASS)

Performs ANALYZE on the materialized view to update statistics, which helps the optimizer generate better execution plans.

SELECT matrixts_internal.analyze_continuous_view('cv1');

matrixts_internal.vacuum_continuous_view(rel REGCLASS, full BOOL)

VACUUMs the materialized view to reclaim space occupied by dead tuples and improve space reuse. If the second parameter is set to true, the table file is rebuilt to reduce storage footprint.

SELECT matrixts_internal.vacuum_continuous_view('cv1', true);

matrixts_internal.rebuild_continuous_view(rel REGCLASS)

Rebuilds the materialized view. This is useful when updates or deletions have been made to the source table and a full resynchronization is required.

SELECT matrixts_internal.rebuild_continuous_view('cv1');

4. FAQ

Is the data in the materialized view synchronized with the source table in real time?
- Yes. After a data insert into the source table is committed, the change is immediately visible in the materialized view.
Does continuous aggregation guarantee ACID (Atomicity, Consistency, Isolation, Durability)?
- Yes. The data in the materialized view shares the same transaction ID as the source table data.
How are UPDATEs and DELETEs handled?
- Updates, deletes, and DROP operations on the source table are not reflected in the continuous aggregation view. However, upsert-style updates and the last_not_null aggregate function for non-null latest values are supported.
Does continuous aggregation support window functions?
- No, window functions are not currently supported.
Does continuous aggregation support partitioned tables?
- Yes.
Can continuous aggregation perform aggregation across multiple tables?
- No.
Can multiple continuous aggregation views be defined on a single table?
- Yes.
Does continuous aggregation apply compression parameters by default?
- No.

← Previous

matrixmgr

Sliding Window

English Русский 简体中文