Query Processing Overview

To perform effective SQL tuning, you must understand what a query plan is. Understanding query plans requires knowledge of the complete query execution process within a database management system. As an evolution and optimization of the PostgreSQL family, YMatrix inherits the classic query processing mechanisms from PostgreSQL—primarily those of Greenplum Database.

Greenplum Database divides nodes into Master and Segment nodes, extending the PostgreSQL architecture to support a distributed database system with the following query processing workflow:

  • The Master receives the query statement and generates a query plan.
  • The Master distributes the query plan to Segments.
  • Segments execute the plan concurrently on their local data sets.
  • The Master collects the results and returns them to the client.

There are two methods for distributing query plans:

  • Distributed parallel query plans
  • Targeted query plans

A distributed parallel query plan means that most database operations—such as table scans, joins, aggregations, and sorts—are executed in parallel across all Segment instances. Each operation runs independently on one Segment's data without accessing data stored on other Segments.

A targeted query plan applies when a query accesses data on only a single Segment. Examples include single-row INSERT, UPDATE, DELETE, or SELECT operations, or queries filtered on the table’s distribution key column. In such cases, the query plan is not sent to all Segments, but instead directed only to the Segment instance(s) containing the relevant rows.

Note that targeted query plans in a distributed database differ from query plans in a standalone database. In a distributed system, each server typically hosts multiple Segment instances, unlike a standalone database which has only one Segment. Therefore, even if no distribution key is defined during table creation, data may still be spread across multiple Segments on the same server. In such cases, even a targeted query plan pointing to a single server might require a Motion operator to gather results from multiple Segments on that server.