Query processing overview

To perform effective SQL tuning, you must understand what a query plan is; and the understanding of a query plan is based on understanding the complete query operation process in the database management system. As a development and optimization product of the classic database PostgreSQL system, YMatrix also inherits the classic mechanism of the PostgreSQL system (mainly Greenplum database).

Greenplum database divides nodes into Master nodes and Segment nodes, and develops a query processing mechanism that is more suitable for distributed database systems based on PostgreSQL database:

  • Master Receives query statements and generates query plans.
  • Master Distributes query plans to Segments.
  • Segments Concurrently executes plans on their respective local datasets.
  • Master collects results and returns them to the client.

There are two ways to distribute query plans:

  • Distribute parallel query plans
  • Distribution targeted query plan

Distributing parallel query plans means that most database operations (such as table scanning, joining, aggregation, and sorting) are performed in parallel on all Segment node instances. Each operation performed on a database of one Segment instance is independent of the data stored in other Segment instance databases.

![](https://img.ymatrix.cn/ymatrix_home/Distribution query plan (screenshot 3)_1692676549.png)

A distribution targeted query plan means that some queries may access only data on a single segment, such as a single row of INSERT, UPDATE, DELETE or SELECT operations or queries filtered with table distribution key columns. In these queries, the query plan is not sent to all Segment instances, but is directed to the Segment instances containing the affected or related rows.

![](https://img.ymatrix.cn/ymatrix_home/Targeted Query Plan (Screenshot 3)_1692676530.png)

It should be noted that the targeted query plan in a distributed database is different from the stand-alone query plan. Because usually there are multiple instances of Segment nodes on each server of a distributed database, unlike a single database that only has one Segment in total, even if the distribution key is not set when creating the table, the data may be stored on multiple Segments on the same server rather than one Segment. In this case, even if the directional query plan points to only one server, it may call the Motion operator to summarize the data of multiple segments.