Capability Overview

The Domino streaming computation engine is a streaming computing module introduced in YMatrix 6.0. It incrementally maintains computation results to improve the efficiency of data analysis and query performance.

Quick Start

The current database version defaults to Domino v1. To use Domino v2, follow the steps below.

Domino v2 is available as an experimental component in YMatrix 6.3.X. Additional plugin creation and GUC settings are required.

-- For version 6.3, manually enable Domino v2 support; restart required for changes to take effect
gpconfig -c domino.enabled -v on
gpstop -ar

-- Create the extension
CREATE EXTENSION domino;

-- 1. Set stream version at session level; streams created in this session will be v2
SET mx_stream_default_version TO 2;

-- Verify version
SELECT version FROM mx_stream WHERE streamrelid = 'your_stream_name'::regclass::oid;

-- 2. Alternatively, enable globally
gpconfig -c mx_stream_default_version -v 2
gpstop -u

For more information on GUC parameters, refer to Technical Parameters.

Architecture and Core Modules

The Domino architecture consists of two main components: the execution framework and the computation engine.

  • Execution Framework: Manages stream scheduling and metadata.
  • Computation Engine: Handles decoding, query rewriting, and execution.

Key Differences

Domino v2 aims to support more streaming tasks while maintaining the same query latency, addressing resource contention and high load issues in v1 when the number of streams increases.

The main differences lie in resource management and execution model. v2 improves scalability and stability through optimized design:

Resource Management

  1. Shared Decoder

    • In v1, each stream performs independent decoding; decoding workload increases with the number of streams.
    • In v2, one decoder is shared per database. Decoded results are stored in TLog files. Workload depends only on data change volume, not on the number of streams.
  2. Shared Worker Processes

    • In v1, each stream occupies a dedicated worker process; more streams mean higher process consumption.
    • In v2, a global pool of worker processes is shared among all streams, limiting total resource usage and avoiding resource contention.
  3. Incremental Execution

    • v1 processes a large range of XLog (from last position to current), which may cause log backlog.
    • v2 introduces the Ticker module, which limits each processing cycle to a small XLog range (default: one Tick, e.g., by MB or time), enabling faster progress and reducing backlog.
  4. Functionality: No significant functional differences between v1 and v2.

  5. Compatibility: v1 and v2 can coexist. However, if the database is downgraded, v2 streams become unavailable, while v1 streams remain unaffected.

Execution Model

Domino v2 optimizes the execution framework with the following core modules:

  1. Ticker
    Splits XLog into manageable chunks. Controls the scope of each stream computation (minimum unit: one Tick), preventing large transactions that could slow the system. Note: Large transactions for historical data processing with WITH DATA are exempt from this restriction.

  2. Scheduler
    Manages stream scheduling. Reuses worker processes to limit overall CPU usage while improving individual worker utilization.

  3. Decoder
    Parses XLog and generates tuples and snapshots (TLog) for stream consumption.

  4. TLog
    Stores decoded change records. Acts as intermediate storage between the Decoder and stream computation, allowing stream processing to read and process changes.

Capability Summary

Capability Supported Notes
Upstream Table Storage Engine HEAP / MARS3
Upstream Table Distribution Type Hash / Random / Segment Set Does not support master-only or replicated tables as upstream
Upstream Table Partitioning Supported
Stream Table Storage Engine HEAP / MARS3 / AO
Stream Table Distribution Key Customizable, can differ from upstream Best practice: use same distribution key. For aggregate streams, matching keys enable localized computation
Stream Table Storage Attributes Independent selection of engine, partitioning, and distribution key
Multi-Column Upstream Tables Supported Supports ≥ 300 columns
Multiple Streams per Table Supported Multiple streams can share the same upstream table
Dimensional Enrichment Supported Supports joining ≥ 10 dimension tables
Aggregation Supported Supports ≥ 32 grouping fields. Internally combines multiple field types into a composite type for aggregation
Upstream Table DDL Not Supported Creating indexes on upstream tables has no effect on downstream streams. Dropping indexes may break stream execution
Stream Table DDL Not Supported DDL operations on stream table columns are not supported. Rebuild the stream if changes are needed. Note: If the upstream table also undergoes DDL, stream rebuild is recommended
Stream Table Indexes Supported Indexes can be independently created and maintained on stream tables
Dimension Filtering Supported Supports filter conditions on dimension tables during join operations
Failover Support Supported Streams continue after segment failover; however, a small number of transactions at the switchover point may be lost
Performance Overhead Minimal impact on upstream write performance; stream results updated within seconds
Large Transaction Handling Supported Optimized batching and memory usage for transaction log decoding. More stable for large transactions. However, use streaming cautiously on tables with frequent large transactions
Historical Data Processing Supported Use WITH DATA when creating a stream to process existing upstream data. If the upstream table is very large, this creates a large transaction that blocks creation of other streams until complete
Dual-Stream JOIN Supported Supports non-equi joins; upstream tables may have different distribution keys; stream and upstream tables may have different distribution keys

Additional Notes

  1. Stream table objects cannot be retrieved via JDBC metadata. Use dedicated SQL statements for querying.

  2. Only superusers can create stream objects.

  3. The SELECT clause in stream definition must not contain duplicate column names. After using aggregate functions (especially in aggregate streams), assign unique aliases, e.g., select avg(col1) as avg_col1, avg(col2) as avg_col2. Alternatively, use column projection in CREATE STREAM.

  4. Avoid direct DML operations on stream tables (controlled by GUC mx_stream_internal_modify).

  5. WITH clauses are not allowed in stream definitions.

  6. Aggregate Stream Restrictions:

    • Only one FROM STREAMING clause is allowed.
    • GROUP BY keys must include all distribution keys of the corresponding stream table.
    • Global aggregation without GROUP BY is not allowed.
    • HAVING clause is not supported. Use nested subqueries with outer WHERE as a workaround.
    • Aggregated expressions cannot participate in further expressions. For example, avg(col1)+1 is invalid, but avg(col+1) is supported.
  7. Dual-Stream JOIN Restrictions:

    • GROUP BY is not allowed in dual-stream JOIN computations.
    • Only equi INNER JOIN between two upstream tables is supported. UNION, LEFT JOIN, or joins with non-stream tables are not supported.
    • Aggregate functions such as DISTINCT, SUM, MAX are not allowed.
    • Window functions are not supported.
    • ORDER BY clause is not supported.
  8. In multi-level streaming, intermediate streams cannot be aggregate streams.