If you work with data analytics, you’ve probably had this painful experience: it’s “just” a big table join, but the database behaves like an old TV stuck on a frozen frame.
The outer table easily reaches tens or hundreds of millions of rows, and the cost of scanning, shipping and computing keeps snowballing. Five minutes later, your query is still “spinning” with no result in sight.
In this article, we’ll look at a performance accelerator that specifically targets slow big table joins — YMatrix Runtime Filter. In many scenarios, it can make your queries run 10× faster, and in some cases even 20× faster.
Let’s quickly recap how a traditional hash join works:
The database first scans the smaller inner table (for example, a user table) and builds a hash dictionary from it — think of it as a pre-built index.
It then scans the large outer table (for example, an orders table) and probes that dictionary row by row to find matching records.
Here’s the problem: the outer table might hold hundreds of millions of rows, but only a few thousand may actually match.
Yet the database still has to scan, transfer and process all those rows on the outer side. That means over 99% of the work is wasted — pure overhead that burns time and resources for no benefit.
The smart part about a runtime filter is this:
Instead of blindly scanning the outer table first, it uses the inner table to build a set of “valid keys” and tells the outer table upfront: “Only keep rows whose keys are in this set.”
A simple example:
Orders table (orders)
Customer table (customer)
Now we want to count the number of orders from customers in China.
Traditional approach:
With a runtime filter:
customer table to build a set of customer IDs from China.orders table.customer_id is in that set are scanned and passed downstream.If the inner table ultimately contributes only 1,000 customer IDs, the runtime filter can precisely keep the rows matching those 1,000 IDs — and skip the remaining 99% of the outer table.
This immediately cuts down I/O and CPU work, so the query naturally becomes much faster.
Nothing speaks louder than real numbers. Here are some results from YMatrix benchmarks:
TPC-H Q17 (classic benchmark)
TPC-H S100 (scale 100 dataset)
Real business workload
In summary, Runtime Filter shines when you have a big outer table and a small result set:
…the more accurately the runtime filter can eliminate useless rows — and the more performance you gain.
YMatrix’s Runtime Filter is designed to be very user-friendly and is enabled by default. With just three small tips, you can start using it effectively:
1.Toggle the feature (on/off)
SET enable_runtime_filter = off;
2.Check the execution plan
EXPLAIN (VERBOSE) to inspect the plan.local/global (local vs. cross-node transfer) and initiator/target (sender/receiver), it indicates the runtime filter is in play.3.Validate the effect
EXPLAIN ANALYZE.inputrows/outputrows
(rows entering / rows passing the filter).outputrows is much smaller than inputrows
— for example, 10 million → 20,000 —
it means the runtime filter has successfully skipped 99.8% of the useless rows.Runtime Filter is powerful, but it’s not a magic bullet for every query. It performs best in scenarios like these:
✅ Large outer table
✅ Small join result set
✅ Filter size stays under control
Next time you’re stuck with a painfully slow big table join, try enabling Runtime Filter — sometimes it only takes a single SET command to make your query take off.
From Greenplum to YMatrix: Migrating Core Business Data for a Leading Power-Battery Manufacturer
PXF: Cross-Source Queries in Seconds with YMatrix
Dahshenlin: Achieving Real-Time Finance-Operations Integration with a Modernized Data Foundation
Smart Manufacturing at Scale with YMatrix HTAP: Real-Time Ingestion & Unified Analytics