This document introduces the basic components of query plans and how to read them. It provides examples and explanations for both single-node and parallel (distributed) query plans.
A query plan, also known as an execution plan or query execution plan, is a critical component in the process of executing a SQL statement. Generated by the query optimizer, it describes in detail how a given SQL statement will be executed within the database.
In centralized databases (such as PostgreSQL and other single-node systems), all data resides on one node instance, and the entire query execution occurs locally. A single-node query plan suffices for such environments.
However, in distributed databases like Greenplum and YMatrix, data is spread across multiple nodes. When creating tables, users must specify a distribution strategy and distribution key. Data is then distributed across multiple MXSegment instances based on this key, with each segment holding only a portion of the total dataset.
As a result, traditional single-node query plans are insufficient. A new type of plan—called a parallel query plan (or distributed query plan)—must be used. This plan leverages data distribution to enable efficient parallel computation and ensures that processing occurs as close to the data as possible. Specifically, computations should occur on MXSegment instances rather than on the MXMaster whenever feasible, minimizing data movement and maximizing query performance.
After loading data into YMatrix, you must first run the ANALYZE command to collect statistics. Periodic full-database ANALYZE operations are also recommended. Accurate statistics are essential for the optimizer to make informed decisions when selecting the most efficient query plan, thereby improving query performance.
Once statistics are collected, use the EXPALIN command to view the logical query plans generated by the optimizer for a given SQL statement. The optimizer typically evaluates multiple potential plans and selects the optimal one for execution. Analyzing and understanding these plans helps developers and database administrators (DBAs) optimize query performance and identify bottlenecks.
A query plan has a tree structure composed of multiple plan nodes. Each node represents a specific operation—such as table scan, join, or aggregation—and passes its result to the parent node for further processing.
Each node contains detailed execution information, including execution order, access method, concurrency level, and estimated cost. The plan tree starts at leaf nodes and ends at the root node. Traversing the tree from bottom to top reveals the complete execution flow.
In distributed query plans, common plan node types and their associated operators are listed below:
| Plan Node Type | Operation Description | Scalar Operator | Vectorized Operator |
|---|---|---|---|
| Scan Plan Nodes | Responsible for scanning tables or indexes | SeqScan: Sequentially scans all rows in a table IndexScan: Traverses an index to fetch rows from the table BitmapHeapScan & BitmapIndexScan: Often used together. Bitmap index scan reads qualifying index entries and builds a bitmap (granularity depends on index size). Bitmap heap scan then uses this bitmap to access the table. Note: Index scan retrieves one entry at a time and immediately checks against conditions, alternating between index and table access—ideal for precise lookups. Bitmap scan operates in two distinct phases: first, it collects all matching index entries and sorts them in memory; then, it accesses the table—better suited for broad-range queries DynamicSeqScan: Uses a partition selection function to choose partitions ParallelSeq: When parallelism is enabled, set the number of parallel workers equal to CPU cores for maximum efficiency |
MxVScan: Vectorized sequential scan for MARS2/AOCO tables |
| Join Plan Nodes | Combines two or more tables | HashJoin: Builds a hash table from the smaller table using the join column as the hash key, then probes the larger table row by row. HashCond shows the joined columnsNestedLoopJoin: Iterates over rows in one dataset and scans the other for matches per iteration. Requires broadcasting one table unless both tables share the same distribution key, allowing local joins without data redistribution MergeJoin: Sorts both datasets and merges them |
MxVHashJoin: Vectorized hash join for MARS2/AOCO tables |
| Other Plan Nodes | Materialize: Caches results of subqueries for reuse; stores output tuples upon first execution Sort: Orders tuples returned from child nodes Group: Groups sorted tuples from lower nodes Agg: Applies aggregate functions Unique: Removes duplicate rows Hash: Builds a hash table, typically used as a helper for HashJoin Limit: Handles LIMIT/OFFSET clausesWindowAgg: Processes window functions LockRows: Locks selected rows, optionally filtered by FOR SHARE/FOR UPDATESetOP: Performs set operations across multiple datasets: union ( UNION), intersection (INTERSECT), difference (EXCEPT/MINUS)SubqueryScan: Evaluates subqueries Append: Combines multiple result sets into one Result: Handles expressions computed once or simple VALUES statements with only a INSERT ... VALUES clauseGroupAgg: First sorts data by grouping columns so identical values are adjacent, then aggregates while scanning HashAgg: Groups and aggregates without sorting by building a hash table Broadcast Motion: Each segment sends its rows to all other segments, resulting in a full local copy on every segment Redistribute Motion: Each segment recalculates data distribution based on a redistribution key and sends rows to the appropriate target segment Gather Motion: Results from all segments are assembled into a single stream. This is usually the final step in most query plans |
MxVMotion: Vectorized Gather Motion for MARS2/AOCO tables MxVSort: Vectorized sort for MARS2/AOCO tables MxVResult: Vectorized Result operation MxVSubqueryScan: Vectorized subquery scan MxVLimit: Vectorized Limit operation MxVHashAgg: Vectorized HashAgg MxVGroupAgg: Vectorized GroupAgg MxVAppend: Vectorized Append |
These nodes can be combined as needed to form a complete query plan.
Due to its nested tree structure, a query plan must be read from bottom to top.
Below are examples of a single-node query plan and a parallel query plan.
First, create two tables. (The example uses PostgreSQL 9.2.)
=# CREATE TABLE sale (
cid int,
pid int,
pnum int,
qty float,
date timestamp
);
=# CREATE TABLE customer (
cid int,
cname text,
cwechat text
);
Insert test data.
=# INSERT INTO sale (cid, pid, pnum, qty, date) VALUES
(1, 1, 1, 10.0, '2023-07-20 08:37:06'),
(2, 3, 6, 35.0, '2023-07-15 18:22:00'),
(3, 2, 1, 3.0, '2023-07-17 11:37:00'),
(4, 1, 2, 10.0, '2023-07-20 10:37:09'),
(5, 3, 2, 35.0, '2023-07-17 08:12:06'),
(6, 1, 1, 10.0, '2023-07-02 12:10:23'),
(7, 3, 1, 35.0, '2023-07-07 08:07:00'),
(8, 5, 3, 99.0, '2023-07-20 10:37:06'),
(9, 3, 1, 35.0, '2023-07-20 15:30:00'),
(10, 3, 1, 35.0, '2023-07-20 09:00:00');
=# INSERT INTO customer (cid, cname, cwechat) VALUES
(1, 'kepler', 'kepler1997'),
(2, 'amiee', 'amiee1995'),
(3, 'lila', 'lila2002'),
(4, 'cici', 'cici1982'),
(5, 'damien', 'damien1983'),
(6, 'ariana', 'ariana1990'),
(7, 'kanye', 'kanye1960'),
(8, 'taylor', 'taylor1996'),
(9, 'michael', 'mike2005'),
(10, 'ray', 'ray1957');
Now compute the total sales amount per customer by joining the sale and customer tables on their common cid key, and analyze the query plan.
First, collect statistics.
=# ANALYZE sale;
=# ANALYZE customer;
Generate the query plan.
=# EXPLAIN SELECT c.cname, sum(s.pnum * s.qty) AS amount FROM sale s, customer c WHERE s.cid = c.cid GROUP BY c.cname;
QUERY PLAN
------------------------------------------------------------------------------
HashAggregate (cost=2.56..2.66 rows=10 width=14)
Group Key: c.cname
-> Hash Join (cost=1.23..2.46 rows=10 width=18)
Hash Cond: (s.cid = c.cid)
-> Seq Scan on sale s (cost=0.00..1.10 rows=10 width=16)
-> Hash (cost=1.10..1.10 rows=10 width=10)
-> Seq Scan on customer c (cost=0.00..1.10 rows=10 width=10)
(7 rows)
This plan has two main nodes: Hash Join and HashAggregate. Each node includes three cost estimates: cost, rows, and width. These help predict execution time, result size, and processing complexity.
The plan tree structure is shown below:
_1691547703.png)
Understanding a query plan requires recognizing its nesting. Each node is a sub-operation, and data flows upward from child to parent. The optimizer begins with a sequential scan of the customer table, costing 1.10. Then, a hash table is built in memory from the sale table via another sequential scan. Using cid as the join key, the system compares rows from the sale table against the hash table. At this stage, startup and total costs are 1.23 and 2.46, respectively. Finally, results are grouped and aggregated by cname, yielding a total cost of 2.66.
Detailed explanation of plan components (see operator table above):
WHERE. Total cost assumes all rows are retrieved, though this may not happen (e.g., with LIMIT).As before, create two tables.
Before creating MARS2 tables, install the matrixts extension for time-series functionality.
=# CREATE EXTENSION matrixts;
=# CREATE TABLE sale (
cid int,
pid int,
pnum int,
qty float,
date timestamp
) USING MARS2
DISTRIBUTED BY (cid,date);
=# CREATE INDEX ON sale USING mars2_btree (cid,date);
=# CREATE TABLE customer (
cid int,
cname text,
cwechat text
) USING MARS2
DISTRIBUTED BY (cid);
=# CREATE INDEX ON customer USING mars2_btree (cid);
Insert the same test data and collect statistics.
Now generate the query plan using the same SQL. Since YMatrix enables vectorization by default, many vectorized operators appear in the plan, enhancing performance.
=# EXPLAIN SELECT c.cname, sum(s.pnum * s.qty) AS amount FROM sale s, customer c WHERE s.cid = c.cid GROUP BY c.cname;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 4:1 (slice1; segments: 4) (cost=72.26..72.44 rows=10 width=14)
-> GroupAggregate (cost=72.26..72.32 rows=2 width=14)
Group Key: c.cname
-> Sort (cost=72.26..72.27 rows=2 width=18)
Sort Key: c.cname
-> Redistribute Motion 4:4 (slice2; segments: 4) (cost=0.00..72.25 rows=2 width=18)
Hash Key: c.cname
-> Nested Loop (cost=0.00..72.20 rows=2 width=18)
Join Filter: (s.cid = c.cid)
-> Custom Scan (MxVMotion) Redistribute Motion 4:4 (slice3; segments: 4) (cost=0.00..36.07 rows=2 width=16)
Hash Key: s.cid
-> Custom Scan (MxVScan) on sale s (cost=0.00..36.02 rows=2 width=16)
-> Materialize (cost=0.00..36.04 rows=2 width=10)
-> Custom Scan (MxVScan) on customer c (cost=0.00..36.02 rows=2 width=10)
Optimizer: Postgres query optimizer
(15 rows)
This plan has six main nodes: Materialize, Redistribute Motion, Nested Loop, Sort, GroupAggregate, and Gather Motion. Each includes cost, row count, and width estimates.
The plan tree structure is shown below:
_1691547677.png)
The optimizer starts with a materialization operation. Scanning and materializing the customer table costs approximately 36. Because the customer table is distributed by cid and the sale table by (cid,date), the sale table must be redistributed by cid so that related rows reside on the same segment, enabling local joins. After redistribution, a nested loop join applies the filter condition s.cid = c.cid. By this point, the total cost reaches about 72, indicating significant resource usage. Since the query requires grouping by cname, a second redistribution occurs, followed by sorting on cname and group aggregation. The final results are sent to the Gather Motion node, where the Gather Motion node computes the total cost (72.32) and returns results to the MXMaster.
Explanation of key components (see operator table above):
slice1. Not all queries require this—e.g., CREATE TABLE x AS SELECT... writes directly to a table, bypassing the MXMaster.Slice division for Segment 1 and Segment 2 is illustrated below:
_1691548044.png)