Query Tuning Category Parameters

This document describes the parameters related to query optimization in the system configuration parameters.

Note!
To ensure system stability and security, please be cautious when manually modifying the relevant parameters.

The query optimization parameters are classified as follows:


Postgres query optimizer operator control parameters

This section of configuration parameters provides the original method that influences the query optimizer's selection of query plans.

If the default plan selected by the optimizer for a specific query is not optimal, we can temporarily resolve this issue by using these configuration parameters to force the optimizer to select a better plan.

Better ways to improve the quality of the plan selected by the optimizer include: adjusting the cost parameters of the planner (see below), manually running ANALYZE, increasing the value of the default_statistics_target configuration parameter, and using the ALTER TABLE SET STATISTICS statement to increase the amount of statistics collected for specific columns.

enable_bitmapscan


Enable or disable the Postgres query optimizer (planner) to use bitmap scan plan types.

  • Note: This is different from bitmap index scans. Bitmap scans mean that the index will be dynamically converted to a bitmap in memory when appropriate, providing faster index performance for complex queries on very large tables. It is used when there are multiple predicates on different index columns. Each bitmap for each column can be compared to create the final list of selected tuples.
Data Type Default Setting Category
boolean on master; session; reload

enable_groupagg


Enable or disable the Postgres query optimizer (planner) to use group aggregation plan types.

Data type Default value Setting category
boolean on master; session; reload

enable_hashagg


Enable or disable the Postgres query optimizer (planner) to use hash aggregation plan types.

Data type Default value Setting category
boolean on master; session; reload

enable_hashjoin


Enable or disable the Postgres query optimizer (planner) to use hash join plan types.

Data type Default value Setting category
boolean on master; session; reload

enable_indexonlyscan


Enable or disable the Postgres query optimizer (planner) only use index scan plan types.

Data Type Default Value Setting Category
boolean on master; session; reload

enable_indexscan


Enable or disable the Postgres query optimizer (planner) to use index scan plan types.

Data type Default value Setting category
boolean on master; session; reload

enable_mars2aggscan


Whether to enable MARS2 aggscan.

Data type Default value Setting category
boolean on master; session; reload

enable_material


Enable or disable Postgres query optimizer (planner) materialization.

  • It is not possible to completely disable materialization, but turning off this variable will prevent the planner from inserting materialization nodes, unless there is a need to ensure correctness.
Data type Default value Setting category
boolean on master; session; reload

enable_mergejoin


Enable or disable the Postgres query optimizer (planner) to use merge join plan types.

  • Merge joins are based on the idea of sorting the left and right tables in order and then scanning them in parallel. Therefore, both data types must be fully sortable, and the join operator must be one that can only successfully join pairs of values at the “same position” in the sorted order. In practice, this means that the join operator must behave like equality.
Data Type Default Setting Category
boolean on master; session; reload

enable_nestloop


Enable or disable the Postgres query optimizer (planner) from using nested loop join plans.

  • It is not possible to completely suppress nested loop joins, but if other methods are available, disabling this variable will prevent the Postgres optimizer from using it.
Data Type Default Value Setting Category
boolean on master; session; reload

enable_parallel_append


Enable or disable the Postgres query optimizer (planner) to use parallel append plan types.

Data type Default value Setting category
boolean on master; session; reload

enable_parallel_hash


Enable or disable the Postgres query optimizer (planner) to use hash join plan types for parallel hashes.

  • If hash join plans are not enabled, this parameter has no effect.
Data type Default value Setting category
boolean on master; session; reload

enable_partition_pruning


Enable or disable the Postgres query optimizer (planner) to reduce the ability to remove partitions from partitioned tables in the query plan.

  • This also controls the Postgres optimizer's ability to create query plans that allow the execution engine to remove (ignore) partitions during query execution. For more details, see Partition Pruning.
Data Type Default Value Setting Category
boolean on master; session; reload

enable_partitionwise_aggregate


Enable or disable the Postgres query optimizer (planner) to use partition-oriented grouping or aggregation.

  • This allows grouping or aggregation on partitioned tables to be executed separately on each partition.
  • If the GROUP BY clause does not include the partition key, only partial aggregation can be executed on a per-partition basis. Finalization must be performed later.
  • Since partition-oriented grouping or aggregation consumes significant CPU time and memory during planning, the default value is off.
Data Type Default Value Setting Category
boolean off master; session; reload

enable_partitionwise_join


Enable or disable the Postgres query optimizer (planner) to use partition-oriented joins.

  • This allows joins between partitioned tables to be executed by matching the corresponding partitions. Partition-oriented joins are currently only applicable when the join conditions include all partition keys, the join conditions must be of the same data type, and the subpartition sets must match exactly.
  • Since partitioned join planning consumes significant CPU time and memory during planning, the default value is off.
Data Type Default Value Setting Category
boolean off master; session; reload

enable_seqscan

Enable or disable the Postgres query optimizer (planner) from using sequential scan plan types.

  • It is not possible to completely suppress sequential scans, but if other methods are available, disabling this variable will prevent the Postgres optimizer from using them.
Data Type Default Value Setting Category
boolean on master;session;reload

enable_sort


Enable or disable the Postgres query optimizer (planner) from using explicit sort steps.

  • It is not possible to completely disable explicit sorting, but if other methods are available, turning off this variable will prevent the Postgres optimizer from using it.
Data Type Default Value Setting Category
boolean on master;session;reload

enable_tidscan


Enable or disable the Postgres query optimizer (planner) to use tuple identifier (TID) scan plan types.

Data type Default value Setting category
boolean on master; session; reload

gp_enable_agg_distinct


Enable or disable two-stage aggregation to calculate a single distinct aggregation.

  • This only applies to subqueries that contain a single distinct aggregation function.
Data type Default value Setting category
boolean on master;session;reload

gp_enable_agg_distinct_pruning


Enable or disable three-stage aggregation and joining to calculate different qualified aggregates.

  • This only applies to subqueries that contain one or more different qualified aggregate functions.
Data type Default value Setting category
boolean on master;session;reload

gp_enable_direct_dispatch


Enable or disable query scheduling for queries targeting data on a single Segment.

  • When enabled, queries targeting rows on a single Segment will only have their query plans assigned to that Segment (rather than all Segments). This significantly reduces the response time for qualifying queries, as no interconnect settings are involved.
  • Direct distribution does require higher CPU utilization on the Master server.
Data Type Default Value Setting Category
boolean on master;system;restart

gp_enable_fast_sri


When set to on, the Postgres query optimizer (planner) plans single-row inserts so that they are sent directly to the correct Segment instance (without motion operations).

  • This significantly improves the performance of single-row insert statements.
Data type Default value Setting classification
boolean on master;session;reload

gp_enable_multiphase_agg


Enable or disable the use of two-phase or three-phase parallel aggregation plans in the Postgres query optimizer.

  • This method applies to any subquery with aggregation. If gp_enable_multiphase_agg is disabled, then gp_enable_agg_distinct and gp_enable_agg_distinct_pruning are also disabled.
Data type Default value Setting category
boolean on master;session;reload

gp_enable_preunique


Enable two-phase deduplication for SELECT DISTINCT queries (not SELECT COUNT(DISTINCT)).

  • When enabled, it adds an additional set of SORT DISTINCT plan nodes before moving.
  • In cases where the number of rows is significantly reduced by different operations, this additional SORT DISTINCT is much less costly than sending rows through interconnects.
Data type Default value Setting category
boolean on master;session;reload

gp_enable_relsize_collection


If the table does not have statistics, Greenplum GPORCA optimizer and Postgres query optimizer (planner) are allowed to use the estimated size of the table (pg_relation_size function).

  • By default, if statistics are unavailable, the Greenplum GPORCA optimizer and planner will use default values to estimate the number of rows.
  • The default behavior can improve query optimization time and reduce resource queue usage in heavy workloads, but may result in suboptimal plans.
  • For the root partition of a partitioned table, this parameter is ignored.
  • When this optimizer is enabled and the root partition has no statistics, the default value will always be used. You can use ANALYZE ROOTPARTITION to collect statistics for the root partition. See ANALYZE for details.
Data Type Default Value Setting Category
boolean off master;session;reload

gp_enable_sort_distinct


Enable removal of duplicates when sorting.

Data type Default value Setting category
boolean on master;session;reload

gp_enable_sort_limit


Enable the LIMIT operation when sorting. Sorting is more efficient when the plan requires at most the first limit_number rows.

Data type Default value Setting category
boolean on master;session;reload

gp_selectivity_damping_for_joins


Suppress selectivity in the Join clause.

Data type Default value Setting category
boolean off master;session;reload

gp_selectivity_damping_for_scans


Suppress selectivity in clauses in the same base table.

Data type Default value Setting category
boolean on master;session;reload

min_parallel_index_scan_size


Set a minimum value (BLOCKS) for the amount of index data that must be scanned. Parallel scanning will only be considered when the amount of index data scanned exceeds this value.

  • Note that parallel index scanning does not typically cover the entire index; it is the number of relevant pages that the optimizer believes will actually be used in the scan.
Data Type Default Value Value Range Setting Category
int 512 0 ~ (INT_MAX/3) master;session;reload

min_parallel_table_scan_size


Set a minimum value (BLOCKS) for the amount of table data that must be scanned. Parallel scanning will only be considered if the amount of table data scanned exceeds this value.

  • For parallel sequential scans, the amount of table data scanned is always equal to the size of the table, but when using indexes, the amount of table data scanned is usually smaller.
Data type Default value Value range Setting category
int 8192 0 ~ (INT_MAX/3) master;session;reload

parallel_leader_participation


Allow the leader process to execute the query plan under the Gather and Gather Merge nodes instead of waiting for the worker processes.

  • Setting this value to off reduces the likelihood of worker processes being blocked due to the leader not reading tuples fast enough, but requires the leader to wait for worker processes to start before producing the first tuple.
  • The extent to which the leader can help or hinder performance depends on the type of plan, the number of workers, and the duration of the query.
Data type Default value Setting category
boolean on master;system;restart


Postgres query optimizer cost calculation parameters

Caution!
Do not adjust the parameters in this section. Changes to these parameters will affect the YMatrix database hardware configuration and typical workloads. All of these parameters are related, and changing the value of one without considering its associated effects will result in a loss of query performance.

cpu_index_tuple_cost

Set the cost estimate for the Postgres optimizer (planner) to process each index item in a single index scan. This is measured as part of the sequential page replacement cost.

Data type Default value Value range Setting classification
floating point 0.005 0 ~ DBL_MAX master;session;reload

cpu_operator_cost


Set the Postgres optimizer (planner) to estimate the cost of each operator in a WHERE clause.

Data type Default value Value range Setting category
floating point 0.0025 0 ~ DBL_MAX master;session;reload

cpu_tuple_cost


Set the Postgres optimizer (planner) to estimate the cost of processing each row (tuple) in a query. This is measured as part of the sequential page replacement cost.

Data type Default value Value range Setting category
floating point 0.01 0 ~ DBL_MAX master;session;reload

cursor_tuple_fraction


Set the Postgres query optimizer (planner) estimate of the proportion of rows in a cursor that will be retrieved.

  • Smaller values cause the planner to favor a “quick start” plan for the cursor, which will quickly retrieve the first few rows but may take a long time to retrieve all rows. Larger values emphasize the total estimated time.
  • The maximum setting is 1.0, at which the cursor will be planned exactly like a regular query, considering only the total estimated time and not how quickly the first few rows are returned.
Data type Default value Value range Setting category
floating point 0.1 0.0 ~ 1.0 master;session;reload

effective_cache_size


Set the Postgres optimizer's (planner's) assumption about the effective disk buffer size (BLOCKS) available for a single query.

  • This parameter is considered in the cost estimation for using an index; higher values make index scans more likely to be used, while lower values make sequential scans more likely to be used.
  • When setting this parameter, you should also consider the shared buffer for YMatrix and the kernel disk buffer that will be used for YMatrix data files, although some data may exist in both locations.
  • Additionally, consider the expected number of concurrent queries across different tables, as they must share the available space.
  • This parameter does not affect the size of shared memory allocated to YMatrix, nor does it reserve kernel disk buffers; it is used solely for estimation purposes. The system does not assume that data will be retained in disk buffers between queries.
  • The gpconfig tool and SHOW command display valid cache size values in “MB” or “kB” units.
Data type Default value Value range Setting category
int 524288 1 ~ INT_MAX master;session;reload

gp_appendonly_random_page_cost


Set the optimizer's estimate of the cost of non-sequential disk page fetches from the AO table.

Data type Default value Value range Setting category
floating point 40.0 0 ~ DBL_MAX master;session;reload

gp_compressed_random_page_cost


Set the optimizer's estimate of the cost of non-sequential access to disk pages from compressed AO tables.

Data type Default value Value range Setting category
floating point 400.0 0 ~ DBL_MAX master;session;reload

gp_motion_cost_per_row


Set the Postgres query optimizer (planner) cost estimate for the Motion operator to transfer rows from one segment to another, measured in decimal parts of the cost of a contiguous page extraction.

  • If set to 0, the value used is twice the cpu_tuple_cost value.
Data Type Default Value Value Range Setting Category
floating point 0.0 0 ~ DBL_MAX master;session;reload

gp_segments_for_planner


Set the number of Primary instances assumed by the Postgres query optimizer (planner) in its cost and size estimates.

  • If set to 0, the actual number of Primary instances is used.
  • This variable affects the optimizer's estimate of the number of rows processed by each send and receive operation in the Motion operator.
Data Type Default Value Value Range Setting Category
int 0 0 ~ INT_MAX master;session;reload

parallel_setup_cost


Set the cost estimate for the Postgres optimizer (planner) when starting parallel worker processes.

Data type Default value Value range Setting category
floating point 1000.0 0 ~ INT_MAX master;session;reload

parallel_tuple_cost


Set the Postgres optimizer (planner) cost estimate for passing a tuple from one parallel worker process to another.

Data type Default value Value range Setting category
floating point 0.1 0 ~ DBL_MAX master;session;reload

random_page_cost


Set the cost estimate for non-sequential disk page reads by the Postgres query optimizer (planner).

  • You can override this value by placing tables and indexes in a special tablespace (set the parameter of the same name for that tablespace).
  • Higher values (relative to seq_page_cost) will cause the system to favor sequential scans, while lower values will cause the system to favor index scans. You can adjust both parameters together to change the relative importance of disk I/O cost compared to CPU cost.
  • Random access to disk storage is typically more than four times more expensive than sequential access. However, since most random access to disk (e.g., indexed reads) is assumed to occur in the cache, a lower default value (4) is used. The default value can be thought of as modeling random access as 40 times slower than sequential access, with the expectation that 90% of random reads will be cached.
  • If you believe that a 90% cache hit rate is an incorrect assumption for your workload, you can increase random_page_cost to better reflect the true cost of random storage reads. Conversely, if your data can be entirely cached (e.g., when the database is smaller than the server's total memory), reducing random_page_cost may be appropriate. For storage with very low random read costs (e.g., solid-state drives), using a lower random_page_cost value may be more appropriate.

Note!
Although you can set random_page_cost to be smaller than seq_page_cost, this does not affect the physical reality. When all databases are in memory, setting both values to be equal is reasonable because, in this case, out-of-order fetching does not incur greater overhead than sequential fetching. Similarly, on databases with high cache hit rates, you should reduce both values relative to CPU overhead because fetching pages from memory incurs significantly less overhead than under typical conditions. If the entire database is cached in RAM, setting them to equal values makes sense, as out-of-order page access incurs no additional overhead in this scenario. Similarly, on a highly cached database, you should reduce both values relative to CPU parameters, as retrieving a page already in RAM incurs significantly lower overhead than under typical conditions.

Data Type Default Value Value Range Setting Category
floating point 4.0 0 ~ DBL_MAX master;session;reload

seq_page_cost


Set the cost of a sequential disk page fetch calculated by the Postgres query optimizer (planner).

  • This value can be overridden for a specific tablespace by setting the similarly named tablespace parameter.
Data Type Default Value Value Range Set Classification
floating point 1.0 0 ~ DBL_MAX master; session; reload


Postgres Genetic Query Optimization Configuration Parameters

GEQO is an algorithm that uses an exploratory search algorithm for query planning. It can reduce the planning time for complex queries (queries that require multi-table joins), but at the cost of the plans it generates will sometimes be worse than those found using exhaustive search algorithms.

geqo


Genetic query optimization is allowed or prohibited.

  • It is usually better not to close it in a production environment.
  • The geqo_threshold variable provides a finer grained null value for GEQO.
Data Type Default Set Category
boolean off master; session; reload

geqo_effort


Controls the balance between planning time and query planning effectiveness in GEQO.

  • The increase in parameter value means that the query planning time increases, but it will also increase the chance of selecting more effective query planning.
  • geqo_efffort does not actually do anything directly; it is simply used to calculate the default values ​​of other variables that affect GEQO behavior.
Data Type Default Value Value Range Set Classification
int 0 0 ~ INT_MAX master; session; reload

geqo_generations


Controls the number of child borne used by GEQO.

  • The child means the number of iterations of the algorithm. It must be at least 1, with the useful range of values ​​and pool sizes the same.
  • If set to 0 (default value), YMatrix will select the appropriate value based on geqo_pool_size.
Data Type Default Value Value Range Set Classification
int 0 0 ~ INT_MAX master; session; reload

geqo_pool_size


Controls the pool size used by GEQO, which is the number of individuals in the genetic population.

  • If it is set to 0 (the default value) a suitable value is selected based on geqo_efffort and the number of tables in the query.
Data Type Default Value Value Range Set Classification
int 0 0 ~ INT_MAX master; session; reload

geqo_seed


Controls the initial value of the random number generator used by GEQO, which is used to select a random path in the connection order search space.

  • This value can range from 0 (default) to 1. Changing this value changes the set of connection paths being explored and can result in finding a better or worse path.
Data Type Default Value Value Range Set Classification
floating point 0.0 0.0 ~ 1.0 master; session; reload

geqo_selection_bias


Controls the selection preferences for GEQO usage.

  • Selection preference is the selection pressure within the population. The value can be between 1.5 and 2.0, which is the default value.
Data Type Default Value Value Range Set Classification
floating point 1.0 0.0 ~ 100.0 master; session; reload

geqo_threshold


Genetic query optimization is only used when the number of FROM items involved has at least this parameter value (note that a FULL OUTER JOIN is counted as only one FROM item).

  • For simpler queries, the ordinary exhaustive search planner is usually used, but for queries with many tables it takes a long time to search, which is usually more expensive than the cost of executing a suboptimal plan. Therefore, a threshold on query size is a convenient way to manage GEQO usage.
Data Type Default Value Value Range Set Classification
int 12 2 ~ INT_MAX master; session; reload


Database statistics sampling parameters

This part of the parameters is used to adjust the amount of data sampled by the ANALYZE operation. Adjusting these parameters will affect the collection of statistical data in the entire system. You can use the ALTER TABLE SET STATISTICS clause to configure statistics collection for certain specific tables and columns.

default_statistics_target


Set default statistical targets for table columns that do not set columns with ALTER TABLE SET STATISTICS.

  • Larger values ​​increase the time of ANALYZE but improve the estimated quality of the Postgres query optimizer.
Data Type Default Value Value Range Set Classification
int 100 0 ~ 10000 master; session; reload


Sort operator (Sort) configuration parameters

gp_enable_sort_limit


Enable the LIMIT operation when sorting. Sort is more efficient when the first limit_number of the line is required at most, sorting is more efficient.

Data Type Default Set Category
boolean on master; session; reload


Aggregation Operator (Agg) Configuration Parameters

gp_enable_agg_distinct


Enable or disable two-stage aggregation to compute a single, distinctly qualified aggregation.

  • This is only applicable to subqueries containing a single different qualified aggregate function.
Data Type Default Set Category
boolean on master; session; reload

gp_enable_agg_distinct_pruning


Enable or disable three-stage aggregation and connection to calculate differently qualified aggregations.

  • This is only applicable to subqueries containing one or more different qualifying aggregate functions.
Data Type Default Set Category
boolean on master; session; reload

gp_enable_multiphase_agg


Enable or disable the Postgres query optimizer using the two- or three-stage parallel aggregation plan.

  • This method is suitable for any subquery with aggregates. If gp_enable_multiphase_agg is off, gp_enable_agg_distinct and gp_enable_agg_distinct_pruning are disabled.
Data Type Default Set Category
boolean on master; session; reload

gp_enable_preunique


Enable two-stage duplicate deletion for the SELECT DISTINCT query (not SELECT COUNT(DISTINCT)).

  • When enabled, it adds an additional set of SORT DISTINCT scheduled nodes before moving.
  • This additional SORT DISTINCT is much cheaper than sending rows over interconnects, where different operations greatly reduce the number of rows.
Data Type Default Set Category
boolean on master; session; reload

gp_workfile_compression


Specifies whether temporary files created when a hash aggregate or hash connection operation overflows to disk.

  • If your database installation uses a serial ATA (SATA) disk drive, enabling compression may help avoid overloading the disk subsystem using I/O operations.
Data Type Default Set Category
boolean off master; session; reload

mx_pullup_agg_proj_and_qual


Extract the projected fields and evaluate agg to a separate result node.

Data Type Default Set Category
boolean off segments; session; reload


Connection operator (Join) configuration parameters

join_collapse_limit

If the resulting list does not exceed so many items, the Postgres optimizer (planner) will rewrite the explicit JOIN (except FULL JOIN) structure into the FROM item list.

  • Smaller values ​​​​​reduce planning time, but may generate poor query plans.
  • By default, this variable is set to the same as from_collapse_limit and is suitable for most cases.
  • Setting it to 1 prevents any reordering of the internal JOINs. Therefore, the explicit join order specified in the query is the actual order in which the relationship is connected.
  • Setting this variable to a value between 1 and from_collapse_limit may help trade off planning time with the quality of the selected plan (a higher value will result in a better plan).
  • Setting this value to geqo_threshold or greater may trigger the use of the GEQO planner, resulting in a non-optimal plan.
Data Type Default Value Value Range Set Classification
int 20 1 ~ INT_MAX master; session; reload

gp_adjust_selectivity_for_outerjoins


Enable the selection of NULL testing on external connections.

Data Type Default Set Category
boolean on master; session; reload

gp_hashjoin_tuples_per_bucket


Sets the target density of the hash table used by the HashJoin operation.

  • Smaller values ​​​​​tend to produce larger hash tables, which can improve join performance.
Data Type Default Value Value Range Set Classification
int 5 1 ~ 25 master; session; reload

gp_workfile_compression


Specifies whether temporary files created when a hash aggregate or hash connection operation overflows to disk.

  • If your database installation uses a serial ATA (SATA) disk drive, enabling compression may help avoid overloading the disk subsystem using I/O operations.
Data Type Default Set Category
boolean off master; session; reload


Other Postgres query optimizer configuration parameters

constraint_exclusion


Controls query The use of table constraints by the Postgres optimizer (planner) to optimize queries.

  • Supported options for this parameter are: on: on (check constraints on all tables), off (never check constraints) and partition (check constraints on inherited subtables and UNION ALL subquery check constraints). partition is the default setting. It is often used with traditional inheritance trees to improve performance.
  • When this parameter is allowed for a specific table, the optimizer compares the query conditions and the CHECK constraints of the table and ignores those scanned tables whose conditions contract the constraints. For example:
    =# CREATE TABLE parent(key integer, ...);
    =# CREATE TABLE child1000(check (key between 1000 and 1999)) INHERITS(parent);
    =# CREATE TABLE child2000(check (key between 2000 and 2999)) INHERITS(parent);
    ...
    =# SELECT * FROM parent WHERE key = 2400;

    When constraint exclusion is enabled, this SELECT will not scan child1000 at all, thereby improving performance.

  • Currently, constraint exclusion is only enabled by default when table partitioning is implemented via an inheritance tree. Enabling it for all tables increases additional planning overhead, especially for simple queries, and does not provide any benefits. It is best to completely disable it when there are no tables partitioned using traditional inheritance trees. (Note: The equivalent feature for partitioned tables is controlled by a separate parameter (enable_partition_pruning))
Data type Default value Value range Setting category
enum on true / false / yes / no / 1 / 0 / on / off / partition master;session;reload

force_parallel_mode


Parallel queries are permitted for testing purposes, even if performance benefits are not expected.

  • Supported options include: off (use parallel mode only when performance improvements are expected), on (force parallel queries whenever the query is deemed safe), and regress (similar to on, but with additional behavior changes as explained below).
  • More specifically, setting this value to on adds a Gather node at the top of any query plan that is safe for parallel queries, so that the query runs within a parallel worker. Even if a parallel worker is unavailable or cannot be used, operations such as starting a transaction that are prohibited in a parallel query context will also be prohibited, unless the optimizer believes that doing so would cause the query to fail (in which case they will not be prohibited). When this option is set, if failures or unexpected results occur, certain functions used by the query may need to be marked as PARALLEL UNSAFE (or PARALLEL RESTRICTED).
  • Setting this value to regress has the same effect as setting it to on, plus additional effects that aid in automated regression testing. Generally, messages from a parallel worker will include a prompt line indicating this, but setting it to regress removes this line, making the output identical to non-parallel execution. Similarly, Gather nodes added to the plan by this setting will be hidden in the EXPLAIN output, resulting in output that matches that produced when set to off.
Data Type Default Value Value Range Setting Category
enum off on / off / regress master;session;reload

from_collapse_limit


If the generated FROM list does not exceed this parameter value, the Postgres query optimizer (planner) will merge the subquery into the parent query.

  • Smaller values reduce planning time but may result in suboptimal query plans.
  • Setting this value to geqo_threshold or higher may trigger the use of the GEQO planner, potentially leading to non-optimal plans.
Data type Default value Valid range Setting category
int 20 1 ~ INT_MAX master;session;reload

gp_enable_predicate_propagation


When enabled, the Postgres query optimizer (planner) applies query predicates to two table expressions when the tables are joined on their distribution key columns.

  • Filtering both tables before joining (if possible) is more efficient.
Data type Default value Setting category
boolean on master;session;reload

gp_max_plan_size


Specify the maximum uncompressed total size of the query execution plan multiplied by the number of Motion operators (slices) in the plan.

  • If the size of the query plan exceeds this value, the query is canceled and an error is returned.
  • A value of 0 indicates that the plan size is not monitored.
  • You can specify values in kB, MB, or GB units. The default unit is kB. For example, the value 200 is 200kB. The value 1GB is equivalent to 1024MB or 1048576kB.
Data type Default value Value range Setting category
int 0 0 ~ (INT_MAX/1024) master;session;superuser

gp_statistics_pullup_from_child_partition


When the Postgres query optimizer (planner) plans a query on the parent table, it allows the use of statistics from the child table.

Data type Default value Setting category
boolean off master;session;reload

gp_statistics_use_fkeys


Once enabled, the Postgres query optimizer (planner) can use foreign key information stored in the system catalog to optimize joins between foreign keys and primary keys.

Data type Default value Setting category
boolean off master;session;reload


查询计划执行

gp_max_slices


Specifies the maximum number of slices that can be generated by a query (a part of the query plan executed on a Segment instance).

  • If the number of slices generated by the query exceeds the specified number, the YMatrix database will return an error and not execute the query.
  • The default value is 0, with no maximum value.
  • Executing queries that generate a large number of slices may impact database performance. For example, queries containing UNION or UNION ALL operators across multiple complex views may generate a large number of slices. You can run EXPLAIN ANALYZE on the query to view the query's slice statistics.
Data type Default value Value range Setting category
int 0 0 ~ INT_MAX master;session;reload

plan_cache_mode


Prepared statements (explicitly prepared or implicitly generated, such as PL/pgSQL) can be executed using custom or generic plans.

  • Custom plans are regenerated for each execution using their specific parameter value sets, while generic plans do not depend on parameter values and can be reused across executions. Therefore, using generic plans can save planning time, but if the optimal plan heavily depends on parameter values, generic plans may be inefficient.
  • This setting is considered when executing cached plans, not when preparing plans.
Data type Default value Value range Setting category
enum auto auto / force_custom_plan / force_generic_plan master;session;reload


JIT (Just in Time) configuration parameters

jit


Does YMatrix allow the use of JIT compilation?

Data type Default value Setting category
boolean on master;session;reload

jit_above_cost


Set the query cost threshold for enabling JIT (Just-in-Time) compilation. If the query cost exceeds this value, JIT compilation will be enabled [https://www.postgresql.org/docs/12/jit.html].

  • Executing JIT consumes some planning time but can accelerate query execution.
  • Setting this value to -1 disables JIT compilation.
Data type Default value Value range Setting category
floating point 100000 -1 ~ DBL_MAX master;session;reload

jit_debugging_support


If LLVM has the required functionality, register the generated functions with GDB. This makes debugging easier.

  • This parameter can only be set when the server starts or when the client connection starts.
Data type Default value Setting category
boolean off master;session;reload;superuser

jit_dump_bitcode


Write the generated LLVM IR to the file system, in the data_directory.

  • This is only useful for internal processing of JIT implementations.
  • Only superusers and users with the appropriate SET permissions can change this setting.
Data type Default value Setting category
boolean off master;session;reload;superuser

jit_expressions

When JIT compilation is enabled, expressions can be JIT compiled.

Data type Default value Setting category
boolean on master;session;reload

jit_inline_above_cost


Set the query cost threshold for JIT compilation attempts to inline functions and operators.

  • If the query cost exceeds this value, JIT compilation will attempt to inline. Inlining increases planning time but can improve execution speed.
  • Setting this parameter to a value less than jit_above_cost is meaningless.
  • Setting this parameter to -1 disables inlining.
Data type Default value Value range Setting category
floating point 500000 -1 ~ DBL_MAX master;session;reload

jit_optimize_above_cost


Set the query cost threshold for applying JIT compilation optimizations.

  • If the query cost exceeds this value, JIT compilation will apply more expensive optimizations. These optimizations increase planning time but can improve execution speed.
  • Setting this parameter to a value less than jit_above_cost is meaningless, and setting it to a value greater than jit_inline_above_cost may not be beneficial.
  • Setting this parameter to -1 disables more expensive optimizations.
Data type Default value Value range Setting category
floating point 500000 -1 ~ DBL_MAX master;session;reload

jit_profiling_support


If LLVM has the required functionality, it emits the data needed for the perf command to analyze JIT-generated functions.

  • This will write the file to ~/.debug/jit/. If you have set and loaded the environment variable JITDUMPDIR, it will instead write to JITDUMPDIR/debug/jit.
  • You are responsible for performing cleanup when necessary.
  • This parameter can only be set at server startup or when a client connection is initiated.
Data type Default value Setting category
boolean off master;session;reload;superuser

jit_tuple_deforming


When JIT compilation is enabled, allow JIT to compile tuple transformations.

Data type Default value Setting category
boolean on master;session;reload


Greenplum ORCA Optimizer Configuration Parameters

gp_enable_relsize_collection


If the table does not have statistics, the ORCA optimizer and Postgres query optimizer are allowed to use the estimated size of the table (pg_relation_size function).

  • By default, if statistics are unavailable, ORCA and the planner will use default values to estimate the number of rows.
  • The default behavior can improve query optimization time and reduce resource queue usage in heavy workloads, but may result in suboptimal plans.
  • For the root partition of a partitioned table, this parameter is ignored.
  • When ORCA is enabled and the root partition lacks statistics, ORCA always uses the default value. You can use ANALYZE ROOTPARTITION to collect statistics for the root partition. See ANALYZE for details.
Data Type Default Value Setting Category
boolean off master;session;reload

optimizer


Enable or disable GPORCA when running SQL queries.

  • Default on. If GPORCA is disabled, the YMatrix database will only use the Postgres query optimizer.
  • GPORCA coexists with the Postgres query optimizer. When GPORCA is enabled, the YMatrix database will use GPORCA to generate execution plans for queries whenever possible. If GPORCA cannot be used, the Postgres query optimizer will be used instead.
  • The optimizer parameter can be set for the database system, a single database, a session, or a query.
Data type Default value Setting category
boolean on master;session;reload

optimizer_analyze_root_partition


For partitioned tables, controls whether the ROOTPARTITION keyword is required when running the ANALYZE command on the table to collect root partition statistics.

  • The GPORCA optimizer in Greenplum uses root partition statistics when generating query plans. The Postgres query optimizer does not use these statistics.
  • The default setting for this parameter is on, meaning that the ANALYZE command can collect root partition statistics without using the ROOTPARTITION keyword. Root partition statistics are collected when running ANALYZE on the root partition, or when running ANALYZE on a leaf partition of a partitioned table and other leaf partitions have statistics.
  • When this value is set to off, you must run ANALYZE ROOTPARTITION to collect root partition statistics.
  • When the server configuration parameter optimizer is set to on (default), this parameter should also be set to on.
  • For information on collecting table statistics for partitioned tables, see ANALYZE.
Data type Default value Setting category
boolean on master;system;restart

optimizer_array_expansion_threshold


When enabling GPORCA (default) and processing queries containing predicates with constant arrays, the optimizer_array_expansion_threshold parameter limits the optimization process based on the number of constants in the array.

  • If the array in the query predicate contains more than the number of elements specified by the parameter, GPORCA disables the conversion of the predicate to its disjunctive normal form during query optimization. For example, when GPORCA executes a query containing an IN clause with more than 100 elements, GPORCA will not convert the predicate to its disjunctive normal form during query optimization to reduce optimization time and memory consumption. The difference in query processing can be observed in the filter conditions of the IN clause in the query EXPLAIN plan.
  • Changing the value of this parameter alters the trade-off between shorter optimization time and lower memory consumption, as well as the potential benefits of constraint derivation during query optimization, such as conflict detection and partition elimination.
  • This parameter can be set for the database system, a single database, or a session or query.
Data type Default value Value range Setting category
int 100 0 ~ INT_MAX master;session;reload

optimizer_control


Control whether the SET, RESET commands or the YMatrix database utility gpconfig can be used to change the server configuration parameter optimizer.

  • If the optimizer_control parameter value is enabled, users can set optimizer parameters.
  • If the optimizer_control parameter value is off, optimizer parameters cannot be changed.
Data Type Default Value Setting Category
boolean on master;system;restart;superuser

optimizer_cost_model


When GPORCA is enabled (default), this parameter controls the cost model that GPORCA selects for bitmap scans, which are used in conjunction with bitmap indexes or B-tree indexes on AO tables.

  • legacy - Retains the calibrated cost model used by GPORCA in Greenplum Database versions 6.13 and earlier
  • calibrated - Improves cost estimation for indexes
  • experimental - Reserved for future experimental cost models; currently equivalent to the calibrated model
  • The default cost model calibrated is more likely to choose faster bitmap indexes with nested loop joins rather than hash joins.
Data type Default value Value range Setting category
enum calibrated legacy / calibrated / experimental master;session;reload

optimizer_cte_inlining_bound

When GPORCA (default) is enabled, this parameter controls the amount of inlining performed for common table expression (CTE) queries (queries containing a WHERE clause). - The default value 0 disables inlining.

  • This parameter can be set for the database system, a single database, or a session or query.
Data type Default value Value range Setting category
int 0 0 ~ INT_MAX master;session;reload

optimizer_dpe_stats


When the Greenplum ORCA optimizer is enabled (default) and this parameter is set to on (default), GPORCA derives statistics that enable it to more accurately estimate the number of rows to scan during dynamic partition elimination.

  • You can set this parameter for the database system, a single database, or a session or query.
Data type Default value Setting category
boolean on master;session;reload

optimizer_enable_associativity


When GPORCA (default setting) is enabled, this parameter controls whether to enable join association transformation during query optimization.

  • Analyze the join order. For the default value off, only the GPORCA dynamic programming algorithm used to analyze the join order is enabled.
  • The join association transformation largely replicates the functionality of newer dynamic programming algorithms.
  • If the value is on, GPORCA can use association transformations during query optimization.
  • This parameter can be set for the database system, a single database, a session, or a query.
Data type Default value Setting category
boolean off master;session;reload

optimizer_enable_dml


When GPORCA is enabled (default) and this parameter is set to on (default), GPORCA attempts to execute DML commands such as INSERT, UPDATE, and DELETE.

  • If GPORCA cannot execute the command, the YMatrix database falls back to the Postgres optimizer.
  • When set to off, the Greenplum database always falls back to the Postgres optimizer when executing DML commands.
  • The parameter can be set for the database system, a single database, or a session or query.
Data Type Default Value Setting Category
boolean on master;session;reload

optimizer_enable_indexonlyscan


When GPORCA is enabled (default) and this parameter is set to on (default), GPORCA can generate an index-only scan plan type for B-tree indexes.

  • GPORCA only accesses index values and does not access the data blocks of the relation.
  • This provides improved query execution performance, especially when the table has been emptied, has wide columns, and GPORCA does not need to retrieve any data blocks (e.g., they are visible).
  • When disabled (off), GPORCA does not generate index-only scan plan types.
  • The parameter can be set for the database system, a single database, or a session or query.
Data Type Default Value Setting Category
boolean on master;session;reload

optimizer_enable_multiple_distinct_aggs


When GPORCA is enabled (default), this parameter allows GPORCA to support multiple different qualified aggregates, such as SELECT count(DISTINCT a),sum(DISTINCT b) FROM foo.

  • By default, this parameter is disabled because its plans are typically not optimal compared to those generated by the Postgres optimizer.
  • The parameter can be set for the database system, a single database, or a session or query.
Data type Default value Setting category
boolean off master;session;reload

optimizer_force_agg_skew_avoidance


When GPORCA is enabled (default), this parameter affects the query plan alternatives that GPORCA considers when generating a 3-stage aggregation plan.

  • When the value is on, by default, GPORCA only considers 3-stage aggregation plans, where intermediate aggregation uses GROUP BY and DISTINCT columns for distribution to reduce the impact of processing skew.
  • If the value is off, GPORCA can also consider plans that distribute using GROUP BY columns.
  • These plans may perform poorly when processing skew is present.
Data type Default value Setting category
boolean on master;session;reload

optimizer_force_multistage_agg


For the default settings, when GPORCA is enabled and this parameter is set to on, GPORCA selects a 3-stage aggregation plan for scalar-different qualified aggregates when generating such plan alternatives.

  • When the value is off, GPORCA selects based on cost rather than heuristics.
  • This parameter can be set for the database system, a single database, a session, or a query.
Data type Default value Setting category
boolean off master;session;reload

optimizer_force_three_stage_scalar_dqa


For the default setting, GPORCA is enabled and this parameter is set to on, GPORCA will select plans with multi-level aggregation when generating such plan alternatives.

  • When the value is off, GPORCA will select based on cost rather than heuristics.
  • This parameter can be set for the database system, individual database, session, or query.
Data Type Default Value Setting Category
boolean on master;session;reload

optimizer_join_arity_for_associativity_commutativity


This value is an optimization hint used to limit the number of join associativity and join commutativity transformations explored during query optimization.

  • This limit controls the alternative plans considered by GPORCA during query optimization. For example, the default value 18 is an optimization hint that GPORCA stops exploring join associativity and join commutativity transformations when the n-ary join operator has more than 18 child nodes during optimization.
  • For queries with a large number of joins, specifying a lower value can improve query performance by limiting the number of alternative query plans evaluated by GPORCA.
  • However, setting the value too low may result in GPORCA generating suboptimal query plans.
  • This parameter is ineffective when the optimizer_join_order parameter is set to query or greedy.
  • This parameter can be set for the database system or session.
Data type Default value Value range Setting category
int 18 0 ~ INT_MAX segments;system;reload

optimizer_join_order


When GPORCA is enabled, this parameter sets the optimization level for join ordering during query optimization by specifying the type of join ordering alternatives to evaluate.

  • query - Uses the join order specified in the query.
  • greedy - Calculates the join order specified in the query and alternative schemes based on the minimum cardinality of the joins.
  • exhaustive - Applies transformation rules to find and evaluate all alternative join orders.
  • The default value is exhaustive. Setting this parameter to query or greedy can generate suboptimal query plans. However, if the administrator is confident that using the query or greedy setting generates satisfactory plans, query optimization time can be improved by setting the parameter to a lower optimization level.
  • Setting this parameter to query or greedy overrides the optimizer_join_order_threshold and optimizer_join_arity_for_associativity_commutativity parameters.
  • This parameter can be set for a single database, session, or query.
Data type Default value Value range Setting category
enum exhaustive exhaustive / greedy / query master;session;reload

optimizer_join_order_threshold


When GPORCA is enabled (default setting), this parameter sets the maximum number of join subgraphs that GPORCA will use based on a dynamically programmed join sorting algorithm.

  • You can set this value for a single query or for the entire session.
  • This parameter is invalid when the optimizer_join_query parameter is set to query or greedy.
Data Type Default Value Value Range Set Classification
int 10 0 ~ 12 master; session; reload

optimizer_mdcache_size

Sets the maximum amount of memory (KB) used by GPORCA on the YMatrix database master to cache query metadata (optimized data) during query optimization.

  • Memory-limited session based.
  • GPORCA uses default settings to cache query metadata during query optimization: Enable GPORCA and enable optimizer_metadata_caching.
  • The default value is 16384 (16MB). This is the best value determined by performance analysis.
  • You can specify values ​​​​​for KB, MB or GB. The default unit is KB. For example, the value 16384 is 16384KB. The value 1GB is the same as 1024MB or 1048576KB.
  • If the value is 0, the cache size is not limited.
  • This parameter can be set for a database system, a single database or a session or query.
Data Type Default Value Value Range Set Classification
int 16384 0 ~ INT_MAX master; session; reload

optimizer_metadata_caching


When GPORCA is enabled (default value), this parameter specifies whether GPORCA caches query metadata (optimized data) in memory on the YMatrix database master during query optimization.

  • The cache is session-based. At the end of the session, the cache will be released. If the number of query metadata exceeds the cache size, the old unused metadata is evicted from the cache.
  • If the value is off, GPORCA does not cache metadata during query optimization.
  • This parameter can be set for a database system, a single database or a session or query.
  • Server configuration parameter optimizer_mdcache_size controls the size of the query metadata cache.
Data Type Default Set Category
boolean on master; session; reload

optimizer_parallel_union


When GPORCA is enabled (default value), the optimizer_parallel_union controls the number of parallelizations of queries containing the UNION or UNION ALL clause.

  • When the value is off, the default GPORCA generates a query plan where each child of the APPEND (UNION) operator is in the same slice as theAPPEND` operator.
  • During query execution, children execute sequentially.
  • When this value is turned on, GPORCA generates a query plan where the reassigned Motion node is under the APPEND (UNION) operator. During query execution, the child and the parent APPEND operator are on different slices, allowing children of the `APPEND(UNION) operator to be executed in parallel on the Segment instance.
  • This parameter can be set for a database system, a single database or a session or query.
Data Type Default Set Category
boolean off master; session; reload

optimizer_penalize_broadcast_threshold


When GPORCA is enabled (default), GPORCA punishes the cost of attempting to broadcast plans that exceed the value specified by this parameter during query optimization.

  • For example, if this parameter is set to 100K row (default), any broadcasts over 100K rows will be severely penalized.
  • When this parameter is set to 0, GPORCA sets this broadcast threshold to unlimited and never punishes broadcast actions.
Data Type Default Value Value Range Set Classification
int 100000 0 ~ INT_MAX master; session; reload

optimizer_penalize_skew


When GPORCA is enabled (default), this parameter allows GPORCA to penalize the Segment cost of HashJoin, using the skewed Redisttribute Motion as a child node to support Broadcast Motion during query optimization.

  • When the NDV (number of different values) is less than the number of segments, GPORCA determines that there is an tilt in the redistribution motion.
  • You can set parameters for a database system, a single database, or a session or query.
Data Type Default Set Category
boolean on master; session; reload

optimizer_print_optimization_stats***

When GPORCA is enabled (default setting), this parameter can record GPORCA query optimization statistics for various optimization phases of the query.

  • The default value is off, please do not record optimization statistics.
  • To record optimization statistics, this parameter must be set to on and the parameter client_min_messages must be set to log.
    • set optimizer_print_optimization_stats = on;
    • set client_min_messages = 'log';
  • Record information during query execution or using the EXPLAIN or EXPLAIN ANALYZE command.
  • This parameter can be set for a database system, a single database or a session or query.
Data Type Default Set Category
boolean off master; session; reload

optimizer_sort_factor


When GPORCA is enabled (default value), optimizer_sort_factor controls the cost factor applied to the sorting operation during query optimization.

  • Default value 1 Specifies the default sorting cost factor.
  • This value is the ratio of the increase or decrease of the default factor. For example, the value 2.0 sets the cost factor to twice the default value and the value 0.5 sets the factor to half the default value.
  • This parameter can be set for a database system, a single database or a session or query.

| Data Type | Default Value | Value Range | Set Classification | | --- | --- | | floating point | 1.0 | 0.0 ~ DBL_MAX | master; session; reload |


Runtime Filter optimization technical configuration parameters

mx_enable_runtime_filter


Whether to enable Runtime Filter.

Data Type Default Set Category
boolean on master; session; reload

mx_runtime_join_ratio


When the ratio of the number of external table rows to internal table rows is less than this value, the Runtime Filter is not triggered.

Data Type Default Value Value Range Set Classification
int 10 0 ~ (INT_MAX/2) master; session; reload

mx_runtime_max_filter_per_query


The maximum number of times Runtime Filter is used in each query.

Data Type Default Value Value Range Set Classification
int 4 0 ~ (INT_MAX/2) master; session; reload

mx_runtime_max_inner_rows


When the number of rows in the join table is greater than this value, the Runtime Filter is not triggered.

Data Type Default Value Value Range Set Classification
int 2000000 0 ~ (INT_MAX/2) master; session; reload

mx_runtime_max_intent_level


Runtime Filter is not triggered when the number of rows of the joining table is less than this value.

Data Type Default Value Value Range Set Classification
int 3 0 ~ (INT_MAX/2) master; session; reload

mx_runtime_min_outer_rows


Runtime Filter is not triggered when the number of rows of the joining table is less than this value.

Data Type Default Value Value Range Set Classification
int 500000 0 ~ (INT_MAX/2) master; session; reload

mx_runtime_scan_ratio


Runtime Filter is not triggered when the ratio of the number of scanned rows in the join table to the number of rows that meet the join conditions is less than this value. | Data Type | Default Value | Value Range | Set Classification | | --- | --- | --- | --- | | int | 25 | 0 ~ (INT_MAX/2) | master; session; reload |


Vectorized execution engine configuration parameters

matrix.enable_mxvector


Whether to enable vectorized execution engine.

Data Type Default Set Category
boolean true master; session; reload

matrix.enable_mxv_aggregate


Whether to enable vectorized execution engines to use group aggregation plan.

Data Type Default Set Category
boolean true master; session; reload

matrix.enable_mxv_hash_aggregate


Whether to enable the vectorized execution engine to use hash aggregation plan.

Data Type Default Set Category
boolean true master; session; reload

matrix.enable_mxv_append


Whether to enable vectorized execution engines to use append plan types.

Data Type Default Set Category
boolean true master; session; reload

matrix.enable_mxv_motion


Whether to enable the vectorized execution engine to use the Motion plan type.

Data Type Default Set Category
boolean true master; session; reload

matrix.enable_mxv_presort_aggregate


Whether to enable vectorized execution engines to use aggregation plan type with explicit pre-sorting.

Data Type Default Set Category
boolean true master; session; reload

matrix.enable_mxv_root


Whether to enable the root executor of the vectorized execution engine.

Data Type Default Set Category
boolean true master; session; reload

matrix.enable_mxv_sort


Whether to enable vectorized execution engines to use sorting plan types.

Data Type Default Set Category
boolean true master; session; reload

matrix.enable_mxv_strict_mode


Whether to enable the vectorized execution engine to use strict mode.

  • If an expression that cannot be processed by vectorization is encountered, an exception will be directly thrown and the server will no longer process the current query.
Data Type Default Set Category
boolean false master; session; reload

matrix.enable_mxv_fallback_expression


Whether to enable vectorized execution engines to use fallback expressions.

  • If this parameter is turned off, an expression that the vectored executor cannot process is encountered, and the entire query returns to the scalar's plan (unveiled node).
Data Type Default Set Category
boolean true master; session; reload

matrix.enable_mxv_limit


Whether to enable vectorized execution engines to use restricted plan types.

Data Type Default Set Category
boolean true master; session; reload

matrix.enable_mxv_window


Whether to enable vectorized execution engines to use window aggregation plan type.

Data Type Default Set Category
boolean false master; session; reload

matrix.enable_mxv_subqueryscan


Whether to enable the vectorized execution engine to use subquery to scan plan types.

Data Type Default Set Category
boolean true master; session; reload

matrix.enable_mxv_hash_join


Whether to enable vectorized execution engines to use hash connection plan types.

Data Type Default Set Category
boolean true master; session; reload

matrix.enable_mxv_hash_join_advanced_hashmap


Whether to enable advanced hash graphs for vectorized hash joins.

Data Type Default Set Category
boolean true master; session; reload

matrix.enable_mxv_aggscan


Whether to enable vectorized execution engines to use the aggregated scan plan type.

Data Type Default Set Category
boolean true master; session; reload

matrix.enable_mxv_indexscan


Whether to enable vectorized execution engines to use index scanning.

Data Type Default Set Category
boolean true master; session; reload