Capability Overview

The Domino streaming computation engine is a streaming computing module introduced in YMatrix 6.0. It incrementally maintains computation results to improve the efficiency of data analysis and query performance.

Quick Start

Domino v2 is enabled by default starting from YMatrix 6.4.X and requires no additional configuration.

For more information on GUC parameters, refer to Technical Parameters.

Architecture and Core Modules

The Domino architecture consists of two main components: the execution framework and the computation engine.

The execution framework manages stream scheduling and metadata.
The computation engine handles decoding, query rewriting, and execution.

Key Differences

The primary goal of Domino v2 is to support more streaming tasks while maintaining the same query latency, addressing resource contention and high load issues in v1 when the number of streams increases.
The main differences lie in resource management and execution mechanisms. Domino v2 improves scalability and stability through optimized design.

Resource Management

Shared Decoder
- In v1, each stream performs independent decoding, increasing workload as the number of streams grows.
- In v2, all streams within a database share a single decoder. Decoded results are stored in TLog files. The decoding workload depends only on the volume of data changes, not on the number of streams.
Shared Worker Processes
- In v1, each stream occupies a dedicated worker process, leading to higher resource consumption as stream count increases.
- In v2, a global pool of worker processes is shared among all streams, limiting total resource usage and preventing resource contention.
Incremental Execution (Small-step Processing)
- In v1, a large range of XLog (from last processed position to current) is processed at once, which may cause log backlog.
- In v2, the Ticker module limits each processing cycle to a small XLog segment (by default, one Tick, e.g., a fixed size in MB or time interval), enabling faster progress and reducing backlog.
Functionality
- No significant functional differences exist between v1 and v2.
Compatibility
- Both v1 and v2 can coexist. However, if the database is downgraded, v2 streams become unavailable, while v1 streams remain unaffected.

Execution Mechanism

Domino v2 focuses on optimizing the execution framework. The core modules are:

Ticker
Splits XLog into manageable segments. Controls the scope of each streaming computation (minimum unit is one Tick), avoiding large transactions that could slow down the system. Note: Large transactions for historical data processing during WITH DATA stream creation are exempt from this limitation.
Scheduler
Manages stream scheduling. Reuses worker processes to cap overall CPU usage while improving individual worker CPU utilization.
Decoder
Parses XLog and generates tuples and snapshots (TLog) for stream consumption.
TLog
Stores decoded change records. Acts as intermediate storage between the Decoder and stream computation, allowing stream processing to read and process changes.

Capability Summary

Capability	Supported	Notes
Upstream Table Storage Engine	HEAP / MARS3
Upstream Table Distribution Type	Hash / Random / Segment Set	Does not support master-only or replicated tables as upstream tables
Upstream Table Partitioning	Supported
Stream Table Storage Engine	HEAP / MARS3 / AO
Stream Table Distribution Key	Flexible choice, can differ from upstream	Best practice: use the same distribution key. For aggregate streams, matching keys enable localized computation
Stream Table Storage Properties	Independent selection of engine, partitioning, and distribution key
Multi-Column Upstream Tables	Supported	Supports upstream tables with ≥ 300 columns
Multiple Streams per Table	Supported	Multiple streams can share the same upstream table. "One table" refers to the same upstream source
Dimensional Enrichment	Supported	Supports joining ≥ 10 dimension tables
Aggregation	Supported	Supports grouping by ≥ 32 fields. Internally combines multiple field types into a composite type for aggregation
Upstream Table DDL	Not Supported	Creating indexes on upstream tables has no effect on downstream streams. Dropping indexes may break stream execution
Stream Table DDL	Not Supported	DDL operations on stream table columns (e.g., ADD/DROP COLUMN) are not supported. Rebuild the stream if changes are needed. Note: If the upstream table also undergoes DDL, stream rebuild is recommended
Stream Table Indexes	Supported	Indexes can be independently created and maintained on stream tables
Dimension Filtering	Supported	Supports filter conditions on dimension tables during dimensional enrichment
Failover Support	Supported	Streams continue working after segment failover. However, a small number of transactions at the switchover point may be lost
Performance Overhead	Minimal impact on upstream write performance; stream results updated within seconds
Large Transaction Handling	Supported	Optimized batching and memory usage during transaction log decoding improves stability for large transactions. However, use streaming cautiously on tables with frequent large transactions
Historical Data Processing	Supported	Use `WITH DATA` when creating a stream to process existing upstream data. If the upstream table is very large, this creates a long-running transaction that blocks creation of other streams until completion
Stream-to-Stream JOIN	Supported	Supports non-equi joins. Upstream tables may have different distribution keys. Stream and upstream tables can have different distribution keys

Additional Notes

Stream table objects cannot be retrieved via JDBC metadata. Use dedicated SQL statements to query them.
Only superusers can create stream objects.
The SELECT clause in stream definition must not contain duplicate column names. Especially for aggregate streams using aggregate functions, assign unique aliases such as select avg(col1) as avg_col1, avg(col2) as avg_col2. Alternatively, use column projection in the CREATE STREAM statement.
Avoid direct DML operations on stream tables (controlled by GUC mx_stream_internal_modify).
WITH clauses are not allowed in stream definitions.
Aggregate Stream Restrictions:
- Only one FROM STREAMING clause is allowed in the stream definition.
- The GROUP BY key must include all distribution keys of the corresponding stream table.
- Global aggregation without GROUP BY is not allowed.
- HAVING clause is not supported. Nested subqueries with outer WHERE filters cannot simulate HAVING.
- Aggregate expressions cannot be reused in further expressions. For example, avg(col1)+1 is invalid, but avg(col+1) is allowed.
Restrictions on Dual-Stream JOIN

Only INNER JOIN and LEFT JOIN are supported.
The JOIN condition must include at least one equality condition, and may include zero or more non-equality conditions.
GROUP BY cannot be used in dual-stream JOIN computations.
UNION is not supported.
Aggregate functions such as DISTINCT, SUM, MAX, etc., cannot be used in dual-stream JOIN computations.
Window functions cannot be used in dual-stream JOIN computations.
The ORDER BY clause is not supported.

In multi-level streaming pipelines, intermediate streams cannot be aggregate streams.

← Previous

Runtime filter

Maintenance

English Русский 简体中文