YMatrix
Quick Start
Simulate Time Series Scenarios
Standard Cluster Deployment
Data Modeling
Connecting to The database
Data Writing
Data Migration
Data Query
Scene Application Examples
Federal Query
Maintenance and Monitoring
Global Maintenance
Partition Maintenance
Backup and Restore
Cluster Expansion
Monitoring
Performance Tuning
Troubleshooting
Reference Guide
Tool Guide
Data Type
Storage Engine
Execution Engine
Configuration Parameters
Index
Extension
SQL Reference
FAQ
This document describes the continuous aggregation feature of YMatrix.
In actual scenarios, aggregation queries are very common. Because aggregation queries are generally based on a basic data set of orders of magnitude and are summarized, grouped and aggregated, they are usually considered to be a relatively time-consuming query type and are mostly used in analytical scenarios. In addition, the execution frequency of aggregation queries is becoming more and more frequent in actual scenarios, and the response time requirements for aggregation queries are also becoming higher and higher.
Continuous aggregation is a mechanism that can quickly respond to aggregation query. After creating a continuous aggregation, the system automatically performs aggregation calculation on the original data in real time during the data writing process. Because it is synchronized with the transaction level of the original table data, it is recommended that users query the continuous aggregation view as directly as possible.
Continuously aggregate dependency matrixts
extension, first create the extension:
=# CREATE EXTENSION matrixts;
Data table is the original data table:
=# CREATE TABLE metrics(
time timestamp,
tag_id int,
sensor float4
)
USING MARS3
DISTRIBUTED BY (tag_id)
ORDER BY (time,tag_id);
Continuous aggregation Use materialized views to store the aggregated results, and the WITH (CONTINUOUS)
keyword and aggregation statement are required when creating the view:
=# CREATE VIEW cv1 WITH (CONTINUOUS) AS
SELECT tag_id, COUNT(*), SUM(sensor)
FROM metrics GROUP BY tag_id;
=# INSERT INTO metrics VALUES(NOW(), 1, 1.1);
=# INSERT INTO metrics VALUES(NOW(), 1, 1.2);
=# INSERT INTO metrics VALUES(NOW(), 2, 2.1);
=# INSERT INTO metrics VALUES(NOW(), 2, 2.2);
When using continuous aggregation, just insert time sequence data into the original data table, and the materialized view will automatically calculate the result:
=# SELECT * FROM cv1 ORDER BY tag_id;
tag_id | count | sum
-------+-------+---------------
1 | 2 | 2.3000002
2 | 2 | 4.3
(2 rows)
The results of the materialized view are the same as those of the aggregation calculation of the original table:
=# SELECT tag_id, COUNT(*), SUM(sensor)
FROM metrics
GROUP BY tag_id
ORDER BY tag_id;
tag_id | count | sum
-------+-------+---------------
1 | 2 | 2.3000002
2 | 2 | 4.3
(2 rows)
If the source table already has data when creating a continuous aggregation, the data will not be synchronized by default. If synchronization is required, the parameter populate
needs to be manually set to true
when creating the view (default is false
):
=# CREATE VIEW cv1 WITH (CONTINUOUS, POPULATE=false) AS
SELECT tag_id, COUNT(*), SUM(sensor)
FROM metrics GROUP BY tag_id;
YMatrix provides UDFs (User Defined Functions) for maintaining continuous aggregation:
matrixts_internal.analyze_continuous_view(rel REGCLASS)
ANALYZE materialized view can update statistical information, which is conducive to the optimizer's calculations:
SELECT matrixts_internal.analyze_continuous_view('cv1');
matrixts_internal.vacuum_continuous_view(rel REGCLASS, full BOOL)
VACUUM materialized view can clean up Dead Tuple and reuse the space; if the second parameter is true, the table file will be rebuilt and storage space will be reduced:
SELECT matrixts_internal.vacuum_continuous_view('cv1', true);
matrixts_internal.rebuild_continuous_view(rel REGCLASS)
Reconstruct the materialized view, suitable for scenarios where the source data table is updated and deleted, and the data is fully synchronized:
SELECT matrixts_internal.rebuild_continuous_view('cv1');