Basic abilities description

Capability Overview

Capability Support Status Description
Upstream Table Storage Engine HEAP / MARS3
Upstream Table Distribution Type Hash Distribution/Random Distribution/Segment Set Mode Does not support master-only tables and replicated tables as upstream tables
Upstream Table Partitioning Supported
Stream Table Storage Engine HEAP / MARS3 / AO
Stream Table Distribution Key Supports free selection of distribution keys, which can differ from the upstream table The optimal solution is to keep them consistent. Especially for aggregated streams, the same distribution key can localize stream computation
Stream table storage characteristics Stream tables can independently select engines, partitions, and distribution keys
Upstream table multi-field support Supported Supports upstream table fields ≥ 300
One table, multiple streams Supported Upstream tables for multiple data streams at the same level can be the same. “One table” refers to the same upstream table
Dimension expansion computation Supported Supports dimension tables with ≥ 10 associated dimensions
Aggregation computation Supported Supports group fields ≥ 32; internally, multiple fields of different types are automatically merged into a single type, then a composite-type field is generated for subsequent aggregation computation
Upstream table DDL support Not supported Creating indexes on upstream tables does not affect downstream stream tables. Deleting indexes on upstream tables may prevent downstream streams from executing
Stream Table DDL Support Not supported Field-level DDL changes are currently not supported on stream tables. If changes are needed, the stream must be rebuilt. Partial DDL functionality will be supported in the future. Note: If the upstream table of a stream table also undergoes DDL changes, it is also recommended to rebuild the stream
Stream table indexes Supported Stream tables can independently create and maintain indexes
Dimension filtering Supported When performing dimension expansion and association calculations, you can add filtering conditions to dimension tables
Failover Supported After a segment node switches between primary and standby modes, the stream can continue to operate stably; however, there is a small probability of losing a few transactions at the time of the switch
Performance overhead Writing to the upstream table has almost no impact, and the stream table's calculation results have sub-second latency
Large transaction capability Supported Internal optimizations have been made to the batch processing mechanism and memory usage for transaction log decoding, making the handling of large transactions more stable. However, tables with large transaction changes should still be used with caution for stream processing
Upstream Table Historical Data Processing Supported When creating a stream, you can use the WITH DATA option to specify processing of historical data from the upstream table. If the upstream table contains a large amount of data, it will generate a very large transaction, which may block the creation of other streams until the transaction completes
Dual-Stream JOIN Supported Supports non-equi-joins; upstream tables can have different distribution keys, and the upstream table and stream can also have different distribution keys

Other restrictions

  • Flow table objects cannot be obtained by JDBC metadata and need to be queried through independent statements
  • The creation of stream objects can only be an administrator user
  • The SELETE part of the defined stream must be guaranteed that there is no duplicate field. Especially after the aggregation flow definition uses the aggregation function, it is best to give the field a different alias, such as select avg(col1) as avg_col1, avg(col2) as avg_col2. Field projection can also be performed through the CREATE STREAM section
  • Try to limit DML operations on data on stream tables (GUC mx_stream_internal_modify)
  • WITH clause cannot be used in stream definition
  • Restrictions on aggregated flow usage:
    • In the stream definition, only one FROM STREAMING can appear.
    • GROUP BY Grouping Key must contain all distribution keys for the corresponding flow table
    • Global aggregation calculation without GROUP BY is not allowed
    • Trialing of HAVING clauses is not allowed. Or use nested subqueries, outer WHERE filtered class HAVING function
    • Aggregate expressions, cannot participate in other expression calculations again. Such as avg(col1)+1, but supports the writing method of avg(col+1).
  • Dual-stream JOIN usage restrictions
    • Cannot try GROUP BY in double stream JOIN calculation
    • Currently, only the equivalent INNER JOIN calculations of two upstream tables can be implemented. Dual streams cannot be used for UNION calculations and LEFT JOIN calculations, as well as related calculations with other non-streamed subscribed tables.
    • Cannot use aggregation functions such as DISINTCT, SUM, MAX, etc. in the calculation of dual stream JOIN
    • Can no longer use window functions in double stream JOIN calculation
    • ORDER BY clause is not supported
  • In multi-stage flow, the flow in the intermediate link cannot be an aggregation flow.

Single flow calculation

  • Single stream: refers to the FROM STREAMING declares only one object in the stream definition statement. The following scenario is also considered to be a declaration of only one object
    
    CREATE TABLE t3 (id int, c1 text, ts timestamp); 

CREATE STREAM s3(id, c1, ts, arrive_s3) AS ( SELECT , clock_timestamp() FROM STREAMING t3 WHERE id > 7 UNION ALL SELECT , clock_timestamp() FROM STREAMING t3 WHERE id < 3 WITH NO DATA ) DISTRIBUTED BY (arrive_s3);


### Multi-stream association calculation

- Multi-stream: In the stream definition, FROM STREAMING can declare multiple. Used to dynamically track data from two associated tables and generate new JOIN results