Ability description

The Domino stream computing engine is a stream computing functional module launched from YMatrix 6.0. Through incremental maintenance of calculation results, it can improve the efficiency of data analysis and query.

Get started quickly

The default is Domino v1 in the current database version. If you need to use Domino v2, you can configure it through the following procedure.

Domino v2 appears as an experimental component in YMatrix 6.3.X, requiring additional plug-ins and setup of GUCs.


-- Version 6.3 requires manual support for domino v2, and restart takes effect
gpconfig -c domino.enabled -v on
gpstop -ar

-- Create plug-in CREATE EXTENSION domino;

-- 1. Set the version of the creation stream at the session level, the stream created by the session is v2 SET mx_stream_default_version TO 2;

-- Confirm the version SELECT version FROM mx_stream WHERE streamrelid = 'your_stream_name'::regclass::oid;

-- 2. It can also be opened globally gpconfig -c mx_stream_default_version -v 2 gpstop -u


> For more information on GUC usage, please refer to [Technical Specifications](/doc/latest/reference/streaming/dominoGUC).

## Architecture and Core Modules
The overall architecture of Domino is divided into two parts: the **execution framework** and the **computing engine**.
- Execution framework is responsible for flow scheduling and metadata management.
- Computing engine is responsible for decoding, query rewriting, and execution.

## Key Differences
The core objective of Domino v2 is to support more stream computing tasks while maintaining query latency, addressing the resource contention and high load issues encountered in v1 when the number of streams increases.  
The core differences are mainly reflected in resource management and execution methods. v2 achieves higher stream support capacity and stability through optimized design:

**Resource Management**

1.Shared Decoder Tool (Decoder)  
  - In v1, each stream is decoded separately, and the more streams there are, the greater the decoding workload.  
  - In v2, each database shares a single decoder, with decoding results stored in TLog files. The decoding workload is only related to the amount of data changes and does not increase with the number of streams.

2.Shared Worker Processes (Worker)
  - In v1, each stream occupies a fixed number of worker processes, and the more streams there are, the more processes are occupied.  
  - In v2, by setting the total number of worker processes, all streams take turns using these processes, controlling total resource consumption and avoiding resource contention due to the number of streams.

3. Incremental Execution:  
  - In v1, a large range of XLog (from the last completed position to the latest position) is processed at once, which may cause log accumulation.
  - v2 Introduces a Ticker module to control processing of only a small range of XLog (defaulting to scanning one Tick forward each time, e.g., a certain number of MB or time) to accelerate processing and reduce accumulation.

4. Functional Aspects: There are no significant functional differences between the two versions.

5. Compatibility: v1 and v2 can coexist, but if the database is downgraded, V2 streams will become unavailable, while V1 remains unaffected.

**Execution method**

Domino v2 has significantly optimized its execution framework, with the following core modules:

1. Ticker  
Used to partition XLog. Controls the scope of a single stream computation (minimum of one tick) to prevent large transactions from slowing down the system; however, large transactions processing historical data when creating a WITH DATA stream are not subject to this restriction.

2. Scheduler  
Responsible for stream scheduling. By reusing Worker processes, it limits overall CPU usage while improving the CPU utilization of individual Workers.

3. Decoder  
Used to parse XLog. Generates Tuples and snapshots (TLog) for stream computation consumption.

4. TLog  
Stores decoded change records. Serves as intermediate storage between the Decoder and stream computation, enabling stream computation to read and process data.


## Capability Overview

| Capability | Support Status | Description |
| --- | --- | --- |
| Upstream Table Storage Engine | HEAP / MARS3| |
| Upstream Table Distribution Type | Hash Distribution/Random Distribution/Segment Set Mode | Does not support master-only tables and replicated tables as upstream tables |
| Upstream Table Partitioning | Supported ||
| Stream Table Storage Engine | HEAP / MARS3 / AO |  |
| Stream Table Distribution Key | Supports free selection of distribution keys, which can differ from the upstream table | The optimal solution is to keep them consistent. Especially for aggregated streams, the same distribution key can localize stream computation |
| Stream table storage characteristics | Stream tables can independently select engines, partitions, and distribution keys |  |
| Upstream table multi-field support | Supported | Supports upstream table fields ≥ 300 |
| One table, multiple streams | Supported | Upstream tables for multiple data streams at the same level can be the same. “One table” refers to the same upstream table |
| Dimension expansion computation | Supported | Supports dimension tables with ≥ 10 associated dimensions |
| Aggregation computation | Supported | Supports group fields ≥ 32; internally, multiple fields of different types are automatically merged into a single type, then a composite-type field is generated for subsequent aggregation computation |
| Upstream table DDL support | Not supported | Creating indexes on upstream tables does not affect downstream stream tables. Deleting indexes on upstream tables may prevent downstream streams from executing |
| Stream Table DDL Support | Not supported | Field-level DDL changes are currently not supported on stream tables. If changes are needed, the stream must be rebuilt. Partial DDL functionality will be supported in the future. Note: If the upstream table of a stream table also undergoes DDL changes, it is also recommended to rebuild the stream |
| Stream table indexes | Supported | Stream tables can independently create and maintain indexes |
| Dimension filtering | Supported | When performing dimension expansion and association calculations, you can add filtering conditions to dimension tables |
| Failover | Supported | After a segment node switches between primary and standby modes, the stream can continue to operate stably; however, there is a small probability of losing a few transactions at the time of the switch |
| Performance overhead | Writing to the upstream table has almost no impact, and the stream table's calculation results have sub-second latency |  |
| Large transaction capability | Supported | Internal optimizations have been made to the batch processing mechanism and memory usage for transaction log decoding, making the handling of large transactions more stable. However, tables with large transaction changes should still be used with caution for stream processing |
| Upstream Table Historical Data Processing | Supported | When creating a stream, you can use the WITH DATA option to specify processing of historical data from the upstream table. If the upstream table contains a large amount of data, it will generate a very large transaction, which may block the creation of other streams until the transaction completes |
| Dual-Stream JOIN | Supported | Supports non-equi-joins; upstream tables can have different distribution keys, and the upstream table and stream can also have different distribution keys |


## Other notes


1. Stream objects cannot be retrieved by JDBC metadata and must be queried using separate statements.  
2. Stream objects can only be created by administrator users.  
3. The SELECT portion of the stream definition must ensure that there are no duplicate column names. Especially after using aggregate functions in the stream definition, it is best to give the columns different aliases, such as `select avg(col1) as avg_col1`, `avg(col2) as avg_col2`. Column projection can also be performed in the CREATE STREAM section.  
4. Direct DML operations on stream table data should be restricted as much as possible (GUC mx_stream_internal_modify).
5. The WITH clause cannot be used in stream definitions.
6. Aggregate stream usage restrictions:
- Only one FROM STREAMING can appear in a stream definition.
- The GROUP BY grouping key must include all distribution keys corresponding to the stream table.
  - Global aggregation calculations without GROUP BY are not allowed.  
- The HAVING clause cannot be used. Alternatively, nested subqueries with outer WHERE filtering can be used for HAVING-like functionality.  
- Aggregation expressions cannot be used in other expression calculations. For example, `avg(col1)+1` is not allowed, but `avg(col+1)` is supported.  

7. Restrictions on dual-stream JOIN usage
  - GROUP BY cannot be used in dual-stream JOIN calculations
  - Currently, only equi-value INNER JOIN calculations between two upstream tables are supported. Dual-stream calculations cannot be used for UNION, LEFT JOIN, or association calculations with non-streaming subscription tables
  - Aggregate functions such as DISTINCT, SUM, and MAX cannot be used in dual-stream JOIN calculations.  
  - Window functions cannot be used in dual-stream JOIN calculations.  
  - The ORDER BY clause is not supported. 

8. In multi-level streams, intermediate streams cannot be aggregate streams.