Quick onboard
Deployment
Data Modeling
Connecting
Migration
Query
Operations and Maintenance
Common Maintenance
Partition
Backup and Restore
Expansion
Mirroring
Resource Management
Security
Monitoring
Performance Tuning
Troubleshooting
Reference Guide
Tool guide
Data type
Storage Engine
Executor
Stream
DR (Disaster Recovery)
Configuration
Index
Extension
SQL Reference
This document describes the principles and usage of the Runtime Filter optimization technique for query performance.
Runtime Filter is one of the key technologies to improve the performance of the YMatrix execution engine. It refers to a dynamic filter created during query execution, after the optimizer has generated the physical execution plan. This differs from filters planned in advance by the optimizer.
Join is one of the most complex operators in SQL queries.
Take Hash Join as an example: the execution engine first scans the inner table (typically the smaller table) to build a hash table, then scans the outer table, computes hash values, and matches rows based on the join condition. The first step is called the build side, and the second is the probe side.
Since the outer table usually contains many rows, scanning it—along with data transfer, hashing, and matching—is time-consuming. If the outer table can be filtered early, query performance may improve significantly.
Runtime Filter reduces I/O during outer table scans and shuffle phases (the process of redistributing data across segments) by applying filters at runtime. This improves query performance. For example, when executing TPC-H Q17 on YMatrix, enabling Runtime Filter results in nearly a 20x performance improvement compared to disabling it.
In YMatrix, Runtime Filter is enabled by default. The cost-based optimizer decides whether to apply Runtime Filter based on estimated execution cost.
Note!
To manually disable Runtime Filter, set:set mx_enable_runtime_filter=off;in your session.
For more parameters, see Runtime Filter System Configuration Parameters.
Runtime Filter plan and execution details are visible using EXPLAIN.
Note!
To display Runtime Filter information in EXPLAIN output, use theVERBOSEkeyword.
Create table rt_ao_t1:
CREATE TABLE rt_ao_t1
(
c1 int,
c2 int
)
WITH (appendoptimized=true, orientation=column, compresslevel=1)
DISTRIBUTED BY(c1);
Create table rt_ao_t2:
CREATE TABLE rt_ao_t2
(
c1 int,
c2 int
)
WITH (appendoptimized=true, orientation=column, compresslevel=1)
DISTRIBUTED BY(c1);
Plan 1
The following plan shows two instances of the RuntimeFilterType keyword—one under the Hash node and one under the MxVScan node.
Where:
local indicates the Runtime Filter is local and does not pass through a Motion node.initiator indicates the initiator of the Runtime Filter (i.e., the Hash node), and target indicates the target (i.e., the MxVScan node).EXPLAIN (VERBOSE) SELECT * FROM rt_ao_t1 t1, rt_ao_t2 t2 WHERE t1.c1 = t2.c1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=73.47..284.43 rows=10000 width=16)
Output: t1.c1, t1.c2, t2.c1, t2.c2
-> Hash Join (cost=73.47..151.10 rows=3333 width=16)
Output: t1.c1, t1.c2, t2.c1, t2.c2
Hash Cond: (t1.c1 = t2.c1)
-> Custom Scan (MxVScan) on public.rt_ao_t1 t1 (cost=0.00..31.80 rows=3333 width=8)
Output: t1.c1, t1.c2
RuntimeFilterType: local, target
-> Hash (cost=31.80..31.80 rows=3333 width=8)
Output: t2.c1, t2.c2
RuntimeFilterType: local, initiator
-> Custom Scan (MxVScan) on public.rt_ao_t2 t2 (cost=0.00..31.80 rows=3333 width=8)
Output: t2.c1, t2.c2
Optimizer: Postgres query optimizer
Settings: enable_hashjoin=on, enable_mergejoin=off, enable_nestloop=off, mx_enable_runtime_filter=on, mx_interconnect_compress=on, mx_runtime_join_ratio=0, mx_runtime_min_outer_rows=0
(15 rows)
Plan 2
This plan differs from the previous one in that the Hash Join requires a Motion node. Therefore, the Runtime Filter type is global. Additionally, the Motion node is marked with broker, indicating that the Runtime Filter must be transmitted through this node.
EXPLAIN (VERBOSE) SELECT * FROM rt_ao_t1 t1, rt_ao_t2 t2 WHERE t1.c2 = t2.c2;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=140.13..417.77 rows=10000 width=16)
Output: t1.c1, t1.c2, t2.c1, t2.c2
-> Hash Join (cost=140.13..284.43 rows=3333 width=16)
Output: t1.c1, t1.c2, t2.c1, t2.c2
Hash Cond: (t1.c2 = t2.c2)
-> Custom Scan (MxVMotion) Redistribute Motion 3:3 (slice2; segments: 3) (cost=0.00..98.47 rows=3333 width=8)
Output: t1.c1, t1.c2
Hash Key: t1.c2
RuntimeFilterType: global, broker
-> Custom Scan (MxVScan) on public.rt_ao_t1 t1 (cost=0.00..31.80 rows=3333 width=8)
Output: t1.c1, t1.c2
RuntimeFilterType: global, target
-> Hash (cost=98.47..98.47 rows=3333 width=8)
Output: t2.c1, t2.c2
RuntimeFilterType: global, initiator
-> Custom Scan (MxVMotion) Redistribute Motion 3:3 (slice3; segments: 3) (cost=0.00..98.47 rows=3333 width=8)
Output: t2.c1, t2.c2
Hash Key: t2.c2
-> Custom Scan (MxVScan) on public.rt_ao_t2 t2 (cost=0.00..31.80 rows=3333 width=8)
Output: t2.c1, t2.c2
Optimizer: Postgres query optimizer
Settings: enable_hashjoin=on, enable_mergejoin=off, enable_nestloop=off, mx_enable_runtime_filter=on, mx_interconnect_compress=on, mx_runtime_join_ratio=0, mx_runtime_min_outer_rows=0
(21 rows)
To obtain Runtime Filter execution details, use EXPLAIN ANALYZE.
The output includes:
c1)EXPLAIN ANALYZE SELECT * FROM rt_ao_t1 t1, rt_ao_t2 t2 WHERE t1.c1 = t2.c1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=73.47..284.43 rows=10000 width=16) (actual time=11.124..15.917 rows=10000 loops=1)
-> Hash Join (cost=73.47..151.10 rows=3333 width=16) (actual time=8.923..13.268 rows=3385 loops=1)
Hash Cond: (t1.c1 = t2.c1)
-> Custom Scan (MxVScan) on rt_ao_t1 t1 (cost=0.00..31.80 rows=3333 width=8) (actual time=1.709..3.520 rows=3385 loops=1)
RuntimeFilter: BloomFilter, column:c1, keys:3385, mem(KB):128, inputrows:3385, outputrows:3385, batch:53
-> Hash (cost=31.80..31.80 rows=3333 width=8) (actual time=4.813..4.813 rows=3385 loops=1)
Buckets: 524288 Batches: 1 Memory Usage: 4229kB
-> Custom Scan (MxVScan) on rt_ao_t2 t2 (cost=0.00..31.80 rows=3333 width=8) (actual time=1.933..2.777 rows=3385 loops=1)
Planning Time: 3.398 ms
(slice0) Executor memory: 94K bytes.
(slice1) Executor memory: 9367K bytes avg x 3 workers, 9379K bytes max (seg1). Work_mem: 4229K bytes max.
Memory used: 128000kB
Optimizer: Postgres query optimizer
Execution Time: 17.945 ms
(14 rows)
Runtime Filter is an optimization for Hash Join. Therefore, any query using Hash Join may attempt to use Runtime Filter. However, the final decision depends on cost estimates made by the optimizer during planning and execution.
Runtime Filter is evaluated during both the planning and execution stages. During planning, if the inner and outer table sizes do not meet certain criteria, the filter is not generated. During execution, if actual data volume differs significantly from estimates, Runtime Filter may be canceled.
Use EXPLAIN VERBOSE to view the query plan. If the plan contains the Runtime Filter keyword, Runtime Filter will be used during execution.