Capability description

The Domino stream computing engine is a stream computing function module introduced in YMatrix 6.0. It maintains calculation results incrementally to improve the efficiency of data analysis and queries.

Quick Start

Domino v2 is enabled by default starting from YMatrix 6.4.X version and does not require any special settings.

For more information on GUC usage, please refer to Technical Specifications.

Architecture and Core Modules

The overall architecture of Domino is divided into two parts: the execution framework and the computing engine.

Execution framework is responsible for flow scheduling and metadata management.
Computing engine is responsible for decoding, query rewriting, and execution.

Key Differences

The core objective of Domino v2 is to support more stream computing tasks while maintaining query latency, addressing the resource contention and high load issues encountered in v1 when the number of streams increases.
The core differences are mainly reflected in resource management and execution methods. v2 achieves higher stream support capacity and stability through optimized design:

Resource Management

1.Shared Decoder Tool (Decoder)

In v1, each stream is decoded separately, and the more streams there are, the greater the decoding workload.
In v2, each database shares a single decoder, with decoding results stored in TLog files. The decoding workload is only related to the amount of data changes and does not increase with the number of streams.

2.Shared Worker Processes (Worker)

In v1, each stream occupies a fixed number of worker processes, and the more streams there are, the more processes are occupied.
In v2, by setting the total number of worker processes, all streams take turns using these processes, controlling total resource consumption and avoiding resource contention due to the number of streams.

Incremental Execution:

In v1, a large range of XLog (from the last completed position to the latest position) is processed at once, which may cause log accumulation.
v2 Introduces a Ticker module to control processing of only a small range of XLog (defaulting to scanning one Tick forward each time, e.g., a certain number of MB or time) to accelerate processing and reduce accumulation.

Functional Aspects: There are no significant functional differences between the two versions.
Compatibility: v1 and v2 can coexist, but if the database is downgraded, V2 streams will become unavailable, while V1 remains unaffected.

Execution method

Domino v2 has significantly optimized its execution framework, with the following core modules:

Ticker
Used to partition XLog. Controls the scope of a single stream computation (minimum of one tick) to prevent large transactions from slowing down the system; however, large transactions processing historical data when creating a WITH DATA stream are not subject to this restriction.
Scheduler
Responsible for stream scheduling. By reusing Worker processes, it limits overall CPU usage while improving the CPU utilization of individual Workers.
Decoder
Used to parse XLog. Generates Tuples and snapshots (TLog) for stream computation consumption.
TLog
Stores decoded change records. Serves as intermediate storage between the Decoder and stream computation, enabling stream computation to read and process data.

Capability Overview

Capability	Support Status	Description
Upstream Table Storage Engine	HEAP / MARS3
Upstream Table Distribution Type	Hash Distribution/Random Distribution/Segment Set Mode	Does not support master-only tables and replicated tables as upstream tables
Upstream Table Partitioning	Supported
Stream Table Storage Engine	HEAP / MARS3 / AO
Stream Table Distribution Key	Supports free selection of distribution keys, which can differ from the upstream table	The optimal solution is to keep them consistent. Especially for aggregated streams, the same distribution key can localize stream computation
Stream table storage characteristics	Stream tables can independently select engines, partitions, and distribution keys
Upstream table multi-field support	Supported	Supports upstream table fields ≥ 300
One table, multiple streams	Supported	Upstream tables for multiple data streams at the same level can be the same. “One table” refers to the same upstream table
Dimension expansion computation	Supported	Supports dimension tables with ≥ 10 associated dimensions
Aggregation computation	Supported	Supports group fields ≥ 32; internally, multiple fields of different types are automatically merged into a single type, then a composite-type field is generated for subsequent aggregation computation
Upstream table DDL support	Not supported	Creating indexes on upstream tables does not affect downstream stream tables. Deleting indexes on upstream tables may prevent downstream streams from executing
Stream Table DDL Support	Not supported	Field-level DDL changes are currently not supported on stream tables. If changes are needed, the stream must be rebuilt. Partial DDL functionality will be supported in the future. Note: If the upstream table of a stream table also undergoes DDL changes, it is also recommended to rebuild the stream
Stream table indexes	Supported	Stream tables can independently create and maintain indexes
Dimension filtering	Supported	When performing dimension expansion and association calculations, you can add filtering conditions to dimension tables
Failover	Supported	After a segment node switches between primary and standby modes, the stream can continue to operate stably; however, there is a small probability of losing a few transactions at the time of the switch
Performance overhead	Writing to the upstream table has almost no impact, and the stream table's calculation results have sub-second latency
Large transaction capability	Supported	Internal optimizations have been made to the batch processing mechanism and memory usage for transaction log decoding, making the handling of large transactions more stable. However, tables with large transaction changes should still be used with caution for stream processing
Upstream Table Historical Data Processing	Supported	When creating a stream, you can use the WITH DATA option to specify processing of historical data from the upstream table. If the upstream table contains a large amount of data, it will generate a very large transaction, which may block the creation of other streams until the transaction completes
Dual-Stream JOIN	Supported	Supports non-equi-joins; upstream tables can have different distribution keys, and the upstream table and stream can also have different distribution keys

Other notes

Stream objects cannot be retrieved by JDBC metadata and must be queried using separate statements.
Stream objects can only be created by administrator users.
The SELECT portion of the stream definition must ensure that there are no duplicate column names. Especially after using aggregate functions in the stream definition, it is best to give the columns different aliases, such as select avg(col1) as avg_col1, avg(col2) as avg_col2. Column projection can also be performed in the CREATE STREAM section.
Direct DML operations on stream table data should be restricted as much as possible (GUC mx_stream_internal_modify).
The WITH clause cannot be used in stream definitions.
Aggregate stream usage restrictions:

Only one FROM STREAMING can appear in a stream definition.
The GROUP BY grouping key must include all distribution keys corresponding to the stream table.
- Global aggregation calculations without GROUP BY are not allowed.
The HAVING clause cannot be used. Alternatively, nested subqueries with outer WHERE filtering can be used for HAVING-like functionality.
Aggregation expressions cannot be used in other expression calculations. For example, avg(col1)+1 is not allowed, but avg(col+1) is supported.

Restrictions on dual-stream JOIN usage

GROUP BY cannot be used in dual-stream JOIN calculations
Currently, only equi-value INNER JOIN calculations between two upstream tables are supported. Dual-stream calculations cannot be used for UNION, LEFT JOIN, or association calculations with non-streaming subscription tables
Aggregate functions such as DISTINCT, SUM, and MAX cannot be used in dual-stream JOIN calculations.
Window functions cannot be used in dual-stream JOIN calculations.
The ORDER BY clause is not supported.