In actual scenarios, aggregation queries are very common. Because aggregation queries are generally based on a basic data set of orders of magnitude and are summarized, grouped and aggregated, they are usually considered to be a relatively time-consuming query type and are mostly used in analytical scenarios. In addition, the execution frequency of aggregation queries is becoming more and more frequent in actual scenarios, and the response time requirements for aggregation queries are also becoming higher and higher.
Continuous aggregation is a mechanism that can quickly respond to aggregation query. After creating a continuous aggregation, the system automatically performs aggregation calculation on the original data in real time during the data writing process. Because it is synchronized with the transaction level of the original table data, it is recommended that users query the continuous aggregation view as directly as possible.
Continuously aggregate dependency matrixts
extension, first create the extension:
CREATE EXTENSION matrixts;
Data table is the original data table:
CREATE TABLE metrics(
time timestamp,
tag_id int,
sensor float4
)
Distributed by(tag_id);
Continuous aggregation Use materialized views to store the aggregated results. You need to give the WITH (CONTINUOUS)
keyword and aggregation statement when creating the view:
CREATE VIEW cv1 WITH (CONTINUOUS) AS
SELECT tag_id, COUNT(*), SUM(sensor)
FROM metrics GROUP BY tag_id;
INSERT INTO metrics VALUES(NOW(), 1, 1.1);
INSERT INTO metrics VALUES(NOW(), 1, 1.2);
INSERT INTO metrics VALUES(NOW(), 2, 2.1);
INSERT INTO metrics VALUES(NOW(), 2, 2.2);
When using continuous aggregation, just insert time sequence data into the original data table, and the materialized view will automatically calculate the result:
SELECT * FROM cv1 ORDER BY tag_id;
tag_id | count | sum
-------+-------+---------------
1 | 2 | 2.3000002
2 | 2 | 4.3
(2 rows)
The results of the materialized view are the same as those of the aggregation calculation of the original table:
SELECT tag_id, COUNT(*), SUM(sensor)
FROM metrics
GROUP BY tag_id
ORDER BY tag_id;
tag_id | count | sum
-------+-------+---------------
1 | 2 | 2.3000002
2 | 2 | 4.3
(2 rows)
If the source table already has data when creating a continuous aggregation, the data will also be synchronized by default. If synchronization is not required, the parameter POPULATE
needs to be manually set to false
when creating the view (default is true
):
CREATE VIEW cv1 WITH (CONTINUOUS, POPULATE=false) AS
SELECT tag_id, COUNT(*), SUM(sensor)
FROM metrics GROUP BY tag_id;
MatrixDB provides UDFs for maintaining continuous aggregation:
matrixts_internal.analyze_continuous_view(rel REGCLASS)
ANALYZE materialized view can update statistics, which is conducive to the optimization for calculation:
SELECT matrixts_internal.analyze_continuous_view('cv1');
matrixts_internal.vacuum_continuous_view(rel REGCLASS, full BOOL)
VACUUM materialized view can clean up the Dead tuple and make the space reuse; if the second parameter is true, the table file will be rebuilt and the storage space will be reduced:
SELECT matrixts_internal.vacuum_continuous_view('cv1', true);
matrixts_internal.rebuild_continuous_view(rel REGCLASS)
Reconstruct the materialized view, suitable for scenarios where the source data table is updated and deleted, and the data is fully synchronized:
SELECT matrixts_internal.rebuild_continuous_view('cv1');