EXPLAIN

Shows the query plan for the statement.

Summary

EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ANALYZE] [VERBOSE] statement

Where option can be one of the following:

    ANALYZE [ boolean ]
    VERBOSE [ boolean ]
    COSTS [ boolean ]
    BUFFERS [ boolean ]
    TIMING [ boolean ]
    FORMAT { TEXT | XML | JSON | YAML }

describe

EXPLAIN displays the query plan generated by the YMatrix or Postgres optimizer for the provided statement. A query plan is a query tree plan for a node. Each node in the plan represents a separate operation such as table scanning, joining, aggregation, or sorting.

The plan should be read from bottom to top, because each node sends a row to the node above it. The lowest-level node of the schedule is usually a table scan operation (sequential scan, index scan, or bitmap index scan). If the query requires connection, aggregation, or sorting (or other operations on the original row), there will be other nodes above the scan node to perform these operations. The top-level plan node is usually the YMatrix database motion node (redistribution, explicit redistribution, broadcast, or collect motion). These operations are responsible for moving rows between segment instances during query processing.

The output of EXPLAIN has a row for each node in the plan tree, showing the basic node type and the following cost estimates made by the planner to execute the plan node:

  • cost — The optimizer guesses how long it takes to run the statement (measured in any unit of cost, but usually refer to disk page acquisition). Two cost numbers are displayed: the startup cost before the first row can be returned, and the total cost of all rows can be returned. Note that the total cost assumes that all rows will be retrieved, but not always (e.g. if using LIMIT).
  • rows — The total number of rows output by this schedule node. This is usually less than the actual number of rows processed or scanned by the planned node, reflecting the estimated selection of any WHERE clause condition. Ideally, the top-level node estimates will approximate the number of rows actually returned, updated, or deleted by the query.
  • width — The total number of bytes for all rows output by this schedule node.

It is important to note that the cost of the superior node includes the cost of all its child nodes. The highest node of the plan has the estimated total execution cost of the plan. This is the number that plans want to minimize. In particular, the cost does not take into account the time it takes to transfer the result row to the client.

EXPLAIN ANALYZE causes the statement to be actually executed, not just to plan. The EXPLAIN ANALYZE optimizer displays the actual results and the planner's estimates. This is useful for seeing if the optimizer's estimate is close to reality. In addition to the information displayed in the EXPLAIN plan, EXPLAIN ANALYZE will also display the following additional information:

  • Total time (in million seconds) spent running the query.
  • The number of workers (segments) involved in the planned node operation. Only the segment of the returned row is calculated.
  • The maximum number of rows returned by the segment with the most rows. If multiple segments produce an equal number of rows, the one with the longest end time is the selected one.
  • Generate the segment ID number of the segment with the most rows for an operation.
  • For related operations, the operation uses work_mem. If work_mem is not enough to perform the operation in memory, the plan will show how much data is overflowing to disk and how many data passes are required for the lowest performance segment. For example:
    Work_mem used: 64K bytes avg, 64K bytes max (seg0). Work_mem wanted: 90K bytes avg, 90K bytes max (seg0) to abate workfile I/O affecting 2 workers. [seg0] pass 0: 488 groups made from 488 rows; 263 rows written to workfile [seg0] pass 1: 263 groups made from 263 rows
  • The time in million seconds spent searching the first row from the segment that produces the most rows, and the total time in searching all rows from that segment. If

Note: Remember that this statement is actually executed when using ANALYZE. Although EXPLAIN ANALYZE will discard any output that SELECT will return, other side effects of this statement will occur as usual. If you want to use EXPLAIN ANALYZE on a DML statement without letting the command affect your data, use the following method:

BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;

Only ANALYZE and VERBOSE options can be specified and can only be specified in that order, without including the option list in parentshes.

Parameters

ANALYZE

  • Execute the command and display the actual run time and other statistics. If this parameter is omitted, the default is FALSE. Specifying ANALYZE true can enable it.

VERBOSE

  • Show additional information about the plan. Specifically, including the output column list of each node in the plan tree, the schema qualifying table, and the function name, always tag variables with a range table alias in the expression, and always print the name of each trigger whose statistics are to be displayed. If this parameter is omitted, it defaults to FALSE; Specify VERBOSE true to enable it.

COSTS

  • Includes information about the estimated startup cost and total cost for each schedule node, as well as the estimated number of rows and the estimated width of each row. If this parameter is omitted, it defaults to TRUE; Specify COSTS false to disable it.

BUFFERS

  • Includes information about buffer usage. Specifically, it includes the number of shared blocks hits, reads, stains and writes, the number of local blocks hits, reads, stains and writes, and the number of temporary blocks reads and writes. Hit means avoiding reads because the block is found in the cache when needed. A shared block contains data from regular tables and indexes; the local block contains data from temporary tables and indexes; a temporary block contains short-term work data for sorting, hashing, materializing planning nodes and similar situations. The number of stained blocks indicates the number of previously unmodified blocks that the query has changed; while the number of written blocks indicates the number of previously processed blocks that the backend explored from the cache during query processing. The number of blocks displayed by the superior node includes the number of blocks used by all its child nodes. In text format, only non-zero values ​​​​​are printed. This parameter is only available if ANALYZE is also enabled. If this parameter is omitted, it defaults to FALSE; Specify BUFFERS true to enable it.

TIMING

  • Includes the actual startup time and the time spent on each node in the output. The overhead of repeating reading of system clocks can significantly slow down querying on some systems, so setting this parameter to FALSE may be useful when only the actual row count is required rather than the exact time. Even if node-level timing is turned off with this option, the run time of the entire statement is always measured. This parameter is only available if ANALYZE is also enabled. The default is TRUE.

FORMAT

  • Specifies the output format, which can be TEXT, XML, JSON, or YAML. Non-text output contains the same information as the text output format, but the program is easier to parse. This parameter defaults to TEXT.

boolean

  • Specifies whether to turn the selected option on or off. You can write TRUE, ON, or 1 to enable the option, while you can write FALSE, OFF, or 0 to disable the option. The Boolean value may also be omitted, in this case, assumed to be TRUE.

statement

  • Any SELECT, INSERT, UPDATE, DELETE, VALUES, EXECUTE, DECLARE or CREATE TABLE AS statements that you want to view their execution plan.

Note

In order for the query optimizer to make reasonable decisions when optimizing queries, the ANALYZE statement should be run to record statistical information about the distribution of data in the table. If you haven't done this yet (or the statistical distribution of data in the table has changed significantly since the last time you ran ANALYZE), the estimated cost is unlikely to match the actual properties of the query, so you might choose a poor query plan.

SQL statements that are run during the execution of the EXPLAIN ANALYZE command are excluded from the YMatrix database resource queue.

Example

To illustrate how to read an EXPLAIN query plan, consider an example of a very simple query:

EXPLAIN SELECT * FROM names WHERE name = 'Joelle';
                                  QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..431.27 rows=1 width=58)
   ->  Seq Scan on names  (cost=0.00..431.27 rows=1 width=58)
         Filter: (name = 'Joelle'::text)
 Optimizer: Pivotal Optimizer (GPORCA) version 3.23.0
(4 rows)

If we read the plan from the bottom up, the query optimizer will start with the sequential scan of the names table. Note that the WHERE clause is used as a filter condition. This means that the scan operation will check the conditions for each row it scans and output only the rows passing the conditions.

The result of the scan operation is passed to the gather motion operation. In the YMatrix database, gather motion is to send segment rows to the master. In this case, we have 3 segment instances sent to 1 master instance (3:1). This operation is performed on slice1 of the parallel query execution plan. In the YMatrix database, the query plan is divided into multiple slices so that the various parts of the query plan can be processed in parallel by These segments.

The estimated start-up cost for the program is 00.00 (no cost) and the total cost is 431.27. The optimizer estimates that this query will return a row.

This is the same query, but the cost estimate is suppressed:

EXPLAIN (COSTS FALSE) SELECT * FROM names WHERE name = 'Joelle';
                QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   ->  Seq Scan on names
         Filter: (name = 'Joelle'::text)
 Optimizer: Pivotal Optimizer (GPORCA) version 3.23.0
(4 rows)

Here is the same query using JSON format:

EXPLAIN (FORMAT JSON) SELECT * FROM names WHERE name = 'Joelle';
                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 [                                            +
   {                                          +
     "Plan": {                                +
       "Node Type": "Gather Motion",          +
       "Senders": 3,                          +
       "Receivers": 1,                        +
       "Slice": 1,                            +
       "Segments": 3,                         +
       "Gang Type": "primary reader",         +
       "Startup Cost": 0.00,                  +
       "Total Cost": 431.27,                  +
       "Plan Rows": 1,                        +
       "Plan Width": 58,                      +
       "Plans": [                             +
         {                                    +
           "Node Type": "Seq Scan",           +
           "Parent Relationship": "Outer",    +
           "Slice": 1,                        +
           "Segments": 3,                     +
           "Gang Type": "primary reader",     +
           "Relation Name": "names",          +
           "Alias": "names",                  +
           "Startup Cost": 0.00,              +
           "Total Cost": 431.27,              +
           "Plan Rows": 1,                    +
           "Plan Width": 58,                  +
           "Filter": "(name = 'Joelle'::text)"+
         }                                    +
       ]                                      +
     },                                       +
     "Settings": {                            +
       "Optimizer": "Pivotal Optimizer (GPORCA) version 3.23.0"      +
     }                                        +
   }                                          +
 ]
(1 row)

If an index exists and we use a query with an indexable WHERE condition, then the EXPLAIN may show a different plan. This query uses YAML format to generate plans with index scans:

EXPLAIN (FORMAT YAML) SELECT * FROM NAMES WHERE LOCATION='Sydney, Australia';
                          QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 - Plan:                                                     +
     Node Type: "Gather Motion"                              +
     Senders: 3                                              +
     Receivers: 1                                            +
     Slice: 1                                                +
     Segments: 3                                             +
     Gang Type: "primary reader"                             +
     Startup Cost: 0.00                                      +
     Total Cost: 10.81                                       +
     Plan Rows: 10000                                        +
     Plan Width: 70                                          +
     Plans:                                                  +
       - Node Type: "Index Scan"                             +
         Parent Relationship: "Outer"                        +
         Slice: 1                                            +
         Segments: 3                                         +
         Gang Type: "primary reader"                         +
         Scan Direction: "Forward"                           +
         Index Name: "names_idx_loc"                         +
         Relation Name: "names"                              +
         Alias: "names"                                      +
         Startup Cost: 0.00                                  +
         Total Cost: 7.77                                    +
         Plan Rows: 10000                                    +
         Plan Width: 70                                      +
         Index Cond: "(location = 'Sydney, Australia'::text)"+
   Settings:                                                 +
     Optimizer: "Pivotal Optimizer (GPORCA) version 3.23.0"
(1 row)

compatibility

There is no EXPLAIN statement defined in the SQL standard.

See also

ANALYZE