Capability Overview

The Domino streaming computation engine is a streaming computing module introduced in YMatrix 6.0. It incrementally maintains computation results to improve the efficiency of data analysis and query performance.

Quick Start

Domino v2 is enabled by default starting from YMatrix 6.4.X and requires no additional configuration.

For more information on GUC parameters, refer to Technical Parameters.

Architecture and Core Modules

The Domino architecture consists of two main components: the execution framework and the computation engine.

  • The execution framework manages stream scheduling and metadata.
  • The computation engine handles decoding, query rewriting, and execution.

Key Differences

The primary goal of Domino v2 is to support more streaming tasks while maintaining the same query latency, addressing resource contention and high load issues in v1 when the number of streams increases.
The main differences lie in resource management and execution mechanisms. Domino v2 improves scalability and stability through optimized design.

Resource Management

  1. Shared Decoder

    • In v1, each stream performs independent decoding, increasing workload as the number of streams grows.
    • In v2, all streams within a database share a single decoder. Decoded results are stored in TLog files. The decoding workload depends only on the volume of data changes, not on the number of streams.
  2. Shared Worker Processes

    • In v1, each stream occupies a dedicated worker process, leading to higher resource consumption as stream count increases.
    • In v2, a global pool of worker processes is shared among all streams, limiting total resource usage and preventing resource contention.
  3. Incremental Execution (Small-step Processing)

    • In v1, a large range of XLog (from last processed position to current) is processed at once, which may cause log backlog.
    • In v2, the Ticker module limits each processing cycle to a small XLog segment (by default, one Tick, e.g., a fixed size in MB or time interval), enabling faster progress and reducing backlog.
  4. Functionality

    • No significant functional differences exist between v1 and v2.
  5. Compatibility

    • Both v1 and v2 can coexist. However, if the database is downgraded, v2 streams become unavailable, while v1 streams remain unaffected.

Execution Mechanism

Domino v2 focuses on optimizing the execution framework. The core modules are:

  1. Ticker
    Splits XLog into manageable segments. Controls the scope of each streaming computation (minimum unit is one Tick), avoiding large transactions that could slow down the system. Note: Large transactions for historical data processing during WITH DATA stream creation are exempt from this limitation.

  2. Scheduler
    Manages stream scheduling. Reuses worker processes to cap overall CPU usage while improving individual worker CPU utilization.

  3. Decoder
    Parses XLog and generates tuples and snapshots (TLog) for stream consumption.

  4. TLog
    Stores decoded change records. Acts as intermediate storage between the Decoder and stream computation, allowing stream processing to read and process changes.

Capability Summary

Capability Supported Notes
Upstream Table Storage Engine HEAP / MARS3
Upstream Table Distribution Type Hash / Random / Segment Set Does not support master-only or replicated tables as upstream tables
Upstream Table Partitioning Supported
Stream Table Storage Engine HEAP / MARS3 / AO
Stream Table Distribution Key Flexible choice, can differ from upstream Best practice: use the same distribution key. For aggregate streams, matching keys enable localized computation
Stream Table Storage Properties Independent selection of engine, partitioning, and distribution key
Multi-Column Upstream Tables Supported Supports upstream tables with ≥ 300 columns
Multiple Streams per Table Supported Multiple streams can share the same upstream table. "One table" refers to the same upstream source
Dimensional Enrichment Supported Supports joining ≥ 10 dimension tables
Aggregation Supported Supports grouping by ≥ 32 fields. Internally combines multiple field types into a composite type for aggregation
Upstream Table DDL Not Supported Creating indexes on upstream tables has no effect on downstream streams. Dropping indexes may break stream execution
Stream Table DDL Not Supported DDL operations on stream table columns (e.g., ADD/DROP COLUMN) are not supported. Rebuild the stream if changes are needed. Note: If the upstream table also undergoes DDL, stream rebuild is recommended
Stream Table Indexes Supported Indexes can be independently created and maintained on stream tables
Dimension Filtering Supported Supports filter conditions on dimension tables during dimensional enrichment
Failover Support Supported Streams continue working after segment failover. However, a small number of transactions at the switchover point may be lost
Performance Overhead Minimal impact on upstream write performance; stream results updated within seconds
Large Transaction Handling Supported Optimized batching and memory usage during transaction log decoding improves stability for large transactions. However, use streaming cautiously on tables with frequent large transactions
Historical Data Processing Supported Use WITH DATA when creating a stream to process existing upstream data. If the upstream table is very large, this creates a long-running transaction that blocks creation of other streams until completion
Stream-to-Stream JOIN Supported Supports non-equi joins. Upstream tables may have different distribution keys. Stream and upstream tables can have different distribution keys

Additional Notes

  1. Stream table objects cannot be retrieved via JDBC metadata. Use dedicated SQL statements to query them.

  2. Only superusers can create stream objects.

  3. The SELECT clause in stream definition must not contain duplicate column names. Especially for aggregate streams using aggregate functions, assign unique aliases such as select avg(col1) as avg_col1, avg(col2) as avg_col2. Alternatively, use column projection in the CREATE STREAM statement.

  4. Avoid direct DML operations on stream tables (controlled by GUC mx_stream_internal_modify).

  5. WITH clauses are not allowed in stream definitions.

  6. Aggregate Stream Restrictions:

    • Only one FROM STREAMING clause is allowed in the stream definition.
    • The GROUP BY key must include all distribution keys of the corresponding stream table.
    • Global aggregation without GROUP BY is not allowed.
    • HAVING clause is not supported. Nested subqueries with outer WHERE filters cannot simulate HAVING.
    • Aggregate expressions cannot be reused in further expressions. For example, avg(col1)+1 is invalid, but avg(col+1) is allowed.
  7. Stream-to-Stream JOIN Restrictions:

    • GROUP BY is not allowed in stream-to-stream JOIN computations.
    • Only equi INNER JOIN between two upstream tables is supported. UNION, LEFT JOIN, or joins with non-streaming tables are not supported.
    • Aggregate functions such as DISTINCT, SUM, MAX are not allowed.
    • Window functions are not supported.
    • ORDER BY clause is not supported.
  8. In multi-level streaming pipelines, intermediate streams cannot be aggregate streams.