The Domino stream computing engine is a stream computing function module introduced in YMatrix 6.0. It maintains calculation results incrementally to improve the efficiency of data analysis and queries.
Domino v2 is enabled by default starting from YMatrix 6.4.X version and does not require any special settings.
For more information on GUC usage, please refer to Technical Specifications.
The overall architecture of Domino is divided into two parts: the execution framework and the computing engine.
The core objective of Domino v2 is to support more stream computing tasks while maintaining query latency, addressing the resource contention and high load issues encountered in v1 when the number of streams increases.
The core differences are mainly reflected in resource management and execution methods. v2 achieves higher stream support capacity and stability through optimized design:
Resource Management
1.Shared Decoder Tool (Decoder)
2.Shared Worker Processes (Worker)
Functional Aspects: There are no significant functional differences between the two versions.
Compatibility: v1 and v2 can coexist, but if the database is downgraded, V2 streams will become unavailable, while V1 remains unaffected.
Execution method
Domino v2 has significantly optimized its execution framework, with the following core modules:
Ticker
Used to partition XLog. Controls the scope of a single stream computation (minimum of one tick) to prevent large transactions from slowing down the system; however, large transactions processing historical data when creating a WITH DATA stream are not subject to this restriction.
Scheduler
Responsible for stream scheduling. By reusing Worker processes, it limits overall CPU usage while improving the CPU utilization of individual Workers.
Decoder
Used to parse XLog. Generates Tuples and snapshots (TLog) for stream computation consumption.
TLog
Stores decoded change records. Serves as intermediate storage between the Decoder and stream computation, enabling stream computation to read and process data.
Capability | Support Status | Description |
---|---|---|
Upstream Table Storage Engine | HEAP / MARS3 | |
Upstream Table Distribution Type | Hash Distribution/Random Distribution/Segment Set Mode | Does not support master-only tables and replicated tables as upstream tables |
Upstream Table Partitioning | Supported | |
Stream Table Storage Engine | HEAP / MARS3 / AO | |
Stream Table Distribution Key | Supports free selection of distribution keys, which can differ from the upstream table | The optimal solution is to keep them consistent. Especially for aggregated streams, the same distribution key can localize stream computation |
Stream table storage characteristics | Stream tables can independently select engines, partitions, and distribution keys | |
Upstream table multi-field support | Supported | Supports upstream table fields ≥ 300 |
One table, multiple streams | Supported | Upstream tables for multiple data streams at the same level can be the same. “One table” refers to the same upstream table |
Dimension expansion computation | Supported | Supports dimension tables with ≥ 10 associated dimensions |
Aggregation computation | Supported | Supports group fields ≥ 32; internally, multiple fields of different types are automatically merged into a single type, then a composite-type field is generated for subsequent aggregation computation |
Upstream table DDL support | Not supported | Creating indexes on upstream tables does not affect downstream stream tables. Deleting indexes on upstream tables may prevent downstream streams from executing |
Stream Table DDL Support | Not supported | Field-level DDL changes are currently not supported on stream tables. If changes are needed, the stream must be rebuilt. Partial DDL functionality will be supported in the future. Note: If the upstream table of a stream table also undergoes DDL changes, it is also recommended to rebuild the stream |
Stream table indexes | Supported | Stream tables can independently create and maintain indexes |
Dimension filtering | Supported | When performing dimension expansion and association calculations, you can add filtering conditions to dimension tables |
Failover | Supported | After a segment node switches between primary and standby modes, the stream can continue to operate stably; however, there is a small probability of losing a few transactions at the time of the switch |
Performance overhead | Writing to the upstream table has almost no impact, and the stream table's calculation results have sub-second latency | |
Large transaction capability | Supported | Internal optimizations have been made to the batch processing mechanism and memory usage for transaction log decoding, making the handling of large transactions more stable. However, tables with large transaction changes should still be used with caution for stream processing |
Upstream Table Historical Data Processing | Supported | When creating a stream, you can use the WITH DATA option to specify processing of historical data from the upstream table. If the upstream table contains a large amount of data, it will generate a very large transaction, which may block the creation of other streams until the transaction completes |
Dual-Stream JOIN | Supported | Supports non-equi-joins; upstream tables can have different distribution keys, and the upstream table and stream can also have different distribution keys |
select avg(col1) as avg_col1
, avg(col2) as avg_col2
. Column projection can also be performed in the CREATE STREAM section. avg(col1)+1
is not allowed, but avg(col+1)
is supported.