Query Tuning Category Parameters

This document describes the parameters related to query optimization in the system configuration parameters.

Note!
To ensure system stability and security, please be cautious when manually modifying the relevant parameters.

The query optimization parameters are classified as follows:

Postgres query optimizer operator control parameters
Postgres query optimizer cost calculation parameters
Postgres Genetic Query Optimization Configuration Parameters
Database Statistics Sampling Parameters
Sort Operator Configuration Parameters
Aggregate Operator Configuration Parameters
Join Operator Configuration Parameters
Other Postgres Query Optimizer Configuration Parameters
Query Plan Execution
JIT (Just in Time) Configuration Parameters
Greenplum ORCA Optimizer Configuration Parameters
Runtime Filter Optimization Technology Configuration Parameters
Vectorized Execution Engine Configuration Parameters

Postgres query optimizer operator control parameters

This section of configuration parameters provides the original method that influences the query optimizer's selection of query plans.

If the default plan selected by the optimizer for a specific query is not optimal, we can temporarily resolve this issue by using these configuration parameters to force the optimizer to select a better plan.

Better ways to improve the quality of the plan selected by the optimizer include: adjusting the cost parameters of the planner (see below), manually running ANALYZE, increasing the value of the default_statistics_target configuration parameter, and using the ALTER TABLE SET STATISTICS statement to increase the amount of statistics collected for specific columns.

enable_bitmapscan

Enable or disable the Postgres query optimizer (planner) to use bitmap scan plan types.

Note: This is different from bitmap index scans. Bitmap scans mean that the index will be dynamically converted to a bitmap in memory when appropriate, providing faster index performance for complex queries on very large tables. It is used when there are multiple predicates on different index columns. Each bitmap for each column can be compared to create the final list of selected tuples.

Data Type	Default	Setting Category
boolean	on	master; session; reload

enable_groupagg

Enable or disable the Postgres query optimizer (planner) to use group aggregation plan types.

Data type	Default value	Setting category
boolean	on	master; session; reload

enable_hashagg

Enable or disable the Postgres query optimizer (planner) to use hash aggregation plan types.

Data type	Default value	Setting category
boolean	on	master; session; reload

enable_hashjoin

Enable or disable the Postgres query optimizer (planner) to use hash join plan types.

Data type	Default value	Setting category
boolean	on	master; session; reload

enable_indexonlyscan

Enable or disable the Postgres query optimizer (planner) only use index scan plan types.

Data Type	Default Value	Setting Category
boolean	on	master; session; reload

enable_indexscan

Enable or disable the Postgres query optimizer (planner) to use index scan plan types.

Data type	Default value	Setting category
boolean	on	master; session; reload

enable_mars2aggscan

Whether to enable MARS2 aggscan.

Data type	Default value	Setting category
boolean	on	master; session; reload

enable_material

Enable or disable Postgres query optimizer (planner) materialization.

It is not possible to completely disable materialization, but turning off this variable will prevent the planner from inserting materialization nodes, unless there is a need to ensure correctness.

Data type	Default value	Setting category
boolean	on	master; session; reload

enable_mergejoin

Enable or disable the Postgres query optimizer (planner) to use merge join plan types.

Merge joins are based on the idea of sorting the left and right tables in order and then scanning them in parallel. Therefore, both data types must be fully sortable, and the join operator must be one that can only successfully join pairs of values at the “same position” in the sorted order. In practice, this means that the join operator must behave like equality.

Data Type	Default	Setting Category
boolean	on	master; session; reload

enable_nestloop

Enable or disable the Postgres query optimizer (planner) from using nested loop join plans.

It is not possible to completely suppress nested loop joins, but if other methods are available, disabling this variable will prevent the Postgres optimizer from using it.

Data Type	Default Value	Setting Category
boolean	on	master; session; reload

enable_parallel_append

Enable or disable the Postgres query optimizer (planner) to use parallel append plan types.

Data type	Default value	Setting category
boolean	on	master; session; reload

enable_parallel_hash

Enable or disable the Postgres query optimizer (planner) to use hash join plan types for parallel hashes.

If hash join plans are not enabled, this parameter has no effect.

Data type	Default value	Setting category
boolean	on	master; session; reload

enable_partition_pruning

Enable or disable the Postgres query optimizer (planner) to reduce the ability to remove partitions from partitioned tables in the query plan.

This also controls the Postgres optimizer's ability to create query plans that allow the execution engine to remove (ignore) partitions during query execution. For more details, see Partition Pruning.

Data Type	Default Value	Setting Category
boolean	on	master; session; reload

enable_partitionwise_aggregate

Enable or disable the Postgres query optimizer (planner) to use partition-oriented grouping or aggregation.

This allows grouping or aggregation on partitioned tables to be executed separately on each partition.
If the GROUP BY clause does not include the partition key, only partial aggregation can be executed on a per-partition basis. Finalization must be performed later.
Since partition-oriented grouping or aggregation consumes significant CPU time and memory during planning, the default value is off.

Data Type	Default Value	Setting Category
boolean	off	master; session; reload

enable_partitionwise_join

Enable or disable the Postgres query optimizer (planner) to use partition-oriented joins.

This allows joins between partitioned tables to be executed by matching the corresponding partitions. Partition-oriented joins are currently only applicable when the join conditions include all partition keys, the join conditions must be of the same data type, and the subpartition sets must match exactly.
Since partitioned join planning consumes significant CPU time and memory during planning, the default value is off.

Data Type	Default Value	Setting Category
boolean	off	master; session; reload

enable_seqscan

Enable or disable the Postgres query optimizer (planner) from using sequential scan plan types.

It is not possible to completely suppress sequential scans, but if other methods are available, disabling this variable will prevent the Postgres optimizer from using them.

Data Type	Default Value	Setting Category
boolean	on	master；session；reload

enable_sort

Enable or disable the Postgres query optimizer (planner) from using explicit sort steps.

It is not possible to completely disable explicit sorting, but if other methods are available, turning off this variable will prevent the Postgres optimizer from using it.

Data Type	Default Value	Setting Category
boolean	on	master；session；reload

enable_tidscan

Enable or disable the Postgres query optimizer (planner) to use tuple identifier (TID) scan plan types.

Data type	Default value	Setting category
boolean	on	master; session; reload

gp_enable_agg_distinct

Enable or disable two-stage aggregation to calculate a single distinct aggregation.

This only applies to subqueries that contain a single distinct aggregation function.

Data type	Default value	Setting category
boolean	on	master；session；reload

gp_enable_agg_distinct_pruning

Enable or disable three-stage aggregation and joining to calculate different qualified aggregates.

This only applies to subqueries that contain one or more different qualified aggregate functions.

Data type	Default value	Setting category
boolean	on	master；session；reload

gp_enable_direct_dispatch

Enable or disable query scheduling for queries targeting data on a single Segment.

When enabled, queries targeting rows on a single Segment will only have their query plans assigned to that Segment (rather than all Segments). This significantly reduces the response time for qualifying queries, as no interconnect settings are involved.
Direct distribution does require higher CPU utilization on the Master server.

Data Type	Default Value	Setting Category
boolean	on	master；system；restart

gp_enable_fast_sri

When set to on, the Postgres query optimizer (planner) plans single-row inserts so that they are sent directly to the correct Segment instance (without motion operations).

This significantly improves the performance of single-row insert statements.

Data type	Default value	Setting classification
boolean	on	master；session；reload

gp_enable_multiphase_agg

Enable or disable the use of two-phase or three-phase parallel aggregation plans in the Postgres query optimizer.

This method applies to any subquery with aggregation. If gp_enable_multiphase_agg is disabled, then gp_enable_agg_distinct and gp_enable_agg_distinct_pruning are also disabled.

Data type	Default value	Setting category
boolean	on	master；session；reload

gp_enable_preunique

Enable two-phase deduplication for SELECT DISTINCT queries (not SELECT COUNT(DISTINCT)).

When enabled, it adds an additional set of SORT DISTINCT plan nodes before moving.
In cases where the number of rows is significantly reduced by different operations, this additional SORT DISTINCT is much less costly than sending rows through interconnects.

Data type	Default value	Setting category
boolean	on	master；session；reload

gp_enable_relsize_collection

If the table does not have statistics, Greenplum GPORCA optimizer and Postgres query optimizer (planner) are allowed to use the estimated size of the table (pg_relation_size function).

By default, if statistics are unavailable, the Greenplum GPORCA optimizer and planner will use default values to estimate the number of rows.
The default behavior can improve query optimization time and reduce resource queue usage in heavy workloads, but may result in suboptimal plans.
For the root partition of a partitioned table, this parameter is ignored.
When this optimizer is enabled and the root partition has no statistics, the default value will always be used. You can use ANALYZE ROOTPARTITION to collect statistics for the root partition. See ANALYZE for details.

Data Type	Default Value	Setting Category
boolean	off	master；session；reload

gp_enable_sort_distinct

Enable removal of duplicates when sorting.

Data type	Default value	Setting category
boolean	on	master；session；reload

gp_enable_sort_limit

Enable the LIMIT operation when sorting. Sorting is more efficient when the plan requires at most the first limit_number rows.

Data type	Default value	Setting category
boolean	on	master；session；reload

gp_selectivity_damping_for_joins

Suppress selectivity in the Join clause.

Data type	Default value	Setting category
boolean	off	master；session；reload

gp_selectivity_damping_for_scans

Suppress selectivity in clauses in the same base table.

Data type	Default value	Setting category
boolean	on	master；session；reload

min_parallel_index_scan_size

Set a minimum value (BLOCKS) for the amount of index data that must be scanned. Parallel scanning will only be considered when the amount of index data scanned exceeds this value.

Note that parallel index scanning does not typically cover the entire index; it is the number of relevant pages that the optimizer believes will actually be used in the scan.

Data Type	Default Value	Value Range	Setting Category
int	512	0 ～（INT_MAX/3）	master；session；reload

min_parallel_table_scan_size

Set a minimum value (BLOCKS) for the amount of table data that must be scanned. Parallel scanning will only be considered if the amount of table data scanned exceeds this value.

For parallel sequential scans, the amount of table data scanned is always equal to the size of the table, but when using indexes, the amount of table data scanned is usually smaller.

Data type	Default value	Value range	Setting category
int	8192	0 ～（INT_MAX/3）	master；session；reload

parallel_leader_participation

Allow the leader process to execute the query plan under the Gather and Gather Merge nodes instead of waiting for the worker processes.

Setting this value to off reduces the likelihood of worker processes being blocked due to the leader not reading tuples fast enough, but requires the leader to wait for worker processes to start before producing the first tuple.
The extent to which the leader can help or hinder performance depends on the type of plan, the number of workers, and the duration of the query.

Data type	Default value	Setting category
boolean	on	master；system；restart

Postgres query optimizer cost calculation parameters

Caution!
Do not adjust the parameters in this section. Changes to these parameters will affect the YMatrix database hardware configuration and typical workloads. All of these parameters are related, and changing the value of one without considering its associated effects will result in a loss of query performance.

cpu_index_tuple_cost

Set the cost estimate for the Postgres optimizer (planner) to process each index item in a single index scan. This is measured as part of the sequential page replacement cost.

Data type	Default value	Value range	Setting classification
floating point	0.005	0 ～ DBL_MAX	master；session；reload

cpu_operator_cost

Set the Postgres optimizer (planner) to estimate the cost of each operator in a WHERE clause.

Data type	Default value	Value range	Setting category
floating point	0.0025	0 ～ DBL_MAX	master；session；reload

cpu_tuple_cost

Set the Postgres optimizer (planner) to estimate the cost of processing each row (tuple) in a query. This is measured as part of the sequential page replacement cost.

Data type	Default value	Value range	Setting category
floating point	0.01	0 ～ DBL_MAX	master；session；reload

cursor_tuple_fraction

Set the Postgres query optimizer (planner) estimate of the proportion of rows in a cursor that will be retrieved.

Smaller values cause the planner to favor a “quick start” plan for the cursor, which will quickly retrieve the first few rows but may take a long time to retrieve all rows. Larger values emphasize the total estimated time.
The maximum setting is 1.0, at which the cursor will be planned exactly like a regular query, considering only the total estimated time and not how quickly the first few rows are returned.

Data type	Default value	Value range	Setting category
floating point	0.1	0.0 ~ 1.0	master；session；reload

effective_cache_size

Set the Postgres optimizer's (planner's) assumption about the effective disk buffer size (BLOCKS) available for a single query.

This parameter is considered in the cost estimation for using an index; higher values make index scans more likely to be used, while lower values make sequential scans more likely to be used.
When setting this parameter, you should also consider the shared buffer for YMatrix and the kernel disk buffer that will be used for YMatrix data files, although some data may exist in both locations.
Additionally, consider the expected number of concurrent queries across different tables, as they must share the available space.
This parameter does not affect the size of shared memory allocated to YMatrix, nor does it reserve kernel disk buffers; it is used solely for estimation purposes. The system does not assume that data will be retained in disk buffers between queries.
The gpconfig tool and SHOW command display valid cache size values in “MB” or “kB” units.

Data type	Default value	Value range	Setting category
int	524288	1 ～ INT_MAX	master；session；reload

gp_appendonly_random_page_cost

Set the optimizer's estimate of the cost of non-sequential disk page fetches from the AO table.

Data type	Default value	Value range	Setting category
floating point	40.0	0 ~ DBL_MAX	master；session；reload

gp_compressed_random_page_cost

Set the optimizer's estimate of the cost of non-sequential access to disk pages from compressed AO tables.

Data type	Default value	Value range	Setting category
floating point	400.0	0 ~ DBL_MAX	master；session；reload

gp_motion_cost_per_row

Set the Postgres query optimizer (planner) cost estimate for the Motion operator to transfer rows from one segment to another, measured in decimal parts of the cost of a contiguous page extraction.

If set to 0, the value used is twice the cpu_tuple_cost value.

Data Type	Default Value	Value Range	Setting Category
floating point	0.0	0 ～ DBL_MAX	master；session；reload

gp_segments_for_planner

Set the number of Primary instances assumed by the Postgres query optimizer (planner) in its cost and size estimates.

If set to 0, the actual number of Primary instances is used.
This variable affects the optimizer's estimate of the number of rows processed by each send and receive operation in the Motion operator.

Data Type	Default Value	Value Range	Setting Category
int	0	0 ~ INT_MAX	master；session；reload

parallel_setup_cost

Set the cost estimate for the Postgres optimizer (planner) when starting parallel worker processes.

Data type	Default value	Value range	Setting category
floating point	1000.0	0 ～ INT_MAX	master；session；reload

parallel_tuple_cost

Set the Postgres optimizer (planner) cost estimate for passing a tuple from one parallel worker process to another.

Data type	Default value	Value range	Setting category
floating point	0.1	0 ～ DBL_MAX	master；session；reload

random_page_cost

Set the cost estimate for non-sequential disk page reads by the Postgres query optimizer (planner).

You can override this value by placing tables and indexes in a special tablespace (set the parameter of the same name for that tablespace).
Higher values (relative to seq_page_cost) will cause the system to favor sequential scans, while lower values will cause the system to favor index scans. You can adjust both parameters together to change the relative importance of disk I/O cost compared to CPU cost.
Random access to disk storage is typically more than four times more expensive than sequential access. However, since most random access to disk (e.g., indexed reads) is assumed to occur in the cache, a lower default value (4) is used. The default value can be thought of as modeling random access as 40 times slower than sequential access, with the expectation that 90% of random reads will be cached.
If you believe that a 90% cache hit rate is an incorrect assumption for your workload, you can increase random_page_cost to better reflect the true cost of random storage reads. Conversely, if your data can be entirely cached (e.g., when the database is smaller than the server's total memory), reducing random_page_cost may be appropriate. For storage with very low random read costs (e.g., solid-state drives), using a lower random_page_cost value may be more appropriate.

Note!
Although you can set random_page_cost to be smaller than seq_page_cost, this does not affect the physical reality. When all databases are in memory, setting both values to be equal is reasonable because, in this case, out-of-order fetching does not incur greater overhead than sequential fetching. Similarly, on databases with high cache hit rates, you should reduce both values relative to CPU overhead because fetching pages from memory incurs significantly less overhead than under typical conditions. If the entire database is cached in RAM, setting them to equal values makes sense, as out-of-order page access incurs no additional overhead in this scenario. Similarly, on a highly cached database, you should reduce both values relative to CPU parameters, as retrieving a page already in RAM incurs significantly lower overhead than under typical conditions.

Data Type	Default Value	Value Range	Setting Category
floating point	4.0	0 ～ DBL_MAX	master；session；reload

seq_page_cost

Set the cost of a sequential disk page fetch calculated by the Postgres query optimizer (planner).

This value can be overridden for a specific tablespace by setting the similarly named tablespace parameter.

Data Type	Default Value	Value Range	Set Classification
floating point	1.0	0 ~ DBL_MAX	master; session; reload

Postgres Genetic Query Optimization Configuration Parameters

GEQO is an algorithm that uses an exploratory search algorithm for query planning. It can reduce the planning time for complex queries (queries that require multi-table joins), but at the cost of the plans it generates will sometimes be worse than those found using exhaustive search algorithms.

geqo

Genetic query optimization is allowed or prohibited.

It is usually better not to close it in a production environment.
The geqo_threshold variable provides a finer grained null value for GEQO.

Data Type	Default	Set Category
boolean	off	master; session; reload

geqo_effort

Controls the balance between planning time and query planning effectiveness in GEQO.

The increase in parameter value means that the query planning time increases, but it will also increase the chance of selecting more effective query planning.
geqo_efffort does not actually do anything directly; it is simply used to calculate the default values of other variables that affect GEQO behavior.

Data Type	Default Value	Value Range	Set Classification
int	0	0 ～ INT_MAX	master; session; reload

geqo_generations

Controls the number of child borne used by GEQO.

The child means the number of iterations of the algorithm. It must be at least 1, with the useful range of values and pool sizes the same.
If set to 0 (default value), YMatrix will select the appropriate value based on geqo_pool_size.

Data Type	Default Value	Value Range	Set Classification
int	0	0 ～ INT_MAX	master; session; reload

geqo_pool_size

Controls the pool size used by GEQO, which is the number of individuals in the genetic population.

If it is set to 0 (the default value) a suitable value is selected based on geqo_efffort and the number of tables in the query.

Data Type	Default Value	Value Range	Set Classification
int	0	0 ~ INT_MAX	master; session; reload

geqo_seed

Controls the initial value of the random number generator used by GEQO, which is used to select a random path in the connection order search space.

This value can range from 0 (default) to 1. Changing this value changes the set of connection paths being explored and can result in finding a better or worse path.

Data Type	Default Value	Value Range	Set Classification
floating point	0.0	0.0 ~ 1.0	master; session; reload

geqo_selection_bias

Controls the selection preferences for GEQO usage.

Selection preference is the selection pressure within the population. The value can be between 1.5 and 2.0, which is the default value.

Data Type	Default Value	Value Range	Set Classification
floating point	1.0	0.0 ~ 100.0	master; session; reload

geqo_threshold

Genetic query optimization is only used when the number of FROM items involved has at least this parameter value (note that a FULL OUTER JOIN is counted as only one FROM item).

For simpler queries, the ordinary exhaustive search planner is usually used, but for queries with many tables it takes a long time to search, which is usually more expensive than the cost of executing a suboptimal plan. Therefore, a threshold on query size is a convenient way to manage GEQO usage.

Data Type	Default Value	Value Range	Set Classification
int	12	2 ~ INT_MAX	master; session; reload

Database statistics sampling parameters

This part of the parameters is used to adjust the amount of data sampled by the ANALYZE operation. Adjusting these parameters will affect the collection of statistical data in the entire system. You can use the ALTER TABLE SET STATISTICS clause to configure statistics collection for certain specific tables and columns.

default_statistics_target

Set default statistical targets for table columns that do not set columns with ALTER TABLE SET STATISTICS.

Larger values increase the time of ANALYZE but improve the estimated quality of the Postgres query optimizer.

Data Type	Default Value	Value Range	Set Classification
int	100	0 ~ 10000	master; session; reload

Sort operator (Sort) configuration parameters

gp_enable_sort_limit

Enable the LIMIT operation when sorting. Sort is more efficient when the first limit_number of the line is required at most, sorting is more efficient.

Data Type	Default	Set Category
boolean	on	master; session; reload

Aggregation Operator (Agg) Configuration Parameters

gp_enable_agg_distinct

Enable or disable two-stage aggregation to compute a single, distinctly qualified aggregation.

This is only applicable to subqueries containing a single different qualified aggregate function.

Data Type	Default	Set Category
boolean	on	master; session; reload

gp_enable_agg_distinct_pruning

Enable or disable three-stage aggregation and connection to calculate differently qualified aggregations.

This is only applicable to subqueries containing one or more different qualifying aggregate functions.

Data Type	Default	Set Category
boolean	on	master; session; reload

gp_enable_multiphase_agg

Enable or disable the Postgres query optimizer using the two- or three-stage parallel aggregation plan.

This method is suitable for any subquery with aggregates. If gp_enable_multiphase_agg is off, gp_enable_agg_distinct and gp_enable_agg_distinct_pruning are disabled.

Data Type	Default	Set Category
boolean	on	master; session; reload

gp_enable_preunique

Enable two-stage duplicate deletion for the SELECT DISTINCT query (not SELECT COUNT(DISTINCT)).

When enabled, it adds an additional set of SORT DISTINCT scheduled nodes before moving.
This additional SORT DISTINCT is much cheaper than sending rows over interconnects, where different operations greatly reduce the number of rows.

Data Type	Default	Set Category
boolean	on	master; session; reload

gp_workfile_compression

Specifies whether temporary files created when a hash aggregate or hash connection operation overflows to disk.

If your database installation uses a serial ATA (SATA) disk drive, enabling compression may help avoid overloading the disk subsystem using I/O operations.

Data Type	Default	Set Category
boolean	off	master; session; reload

mx_pullup_agg_proj_and_qual

Extract the projected fields and evaluate agg to a separate result node.

Data Type	Default	Set Category
boolean	off	segments; session; reload

Connection operator (Join) configuration parameters

join_collapse_limit

If the resulting list does not exceed so many items, the Postgres optimizer (planner) will rewrite the explicit JOIN (except FULL JOIN) structure into the FROM item list.

Smaller values reduce planning time, but may generate poor query plans.
By default, this variable is set to the same as from_collapse_limit and is suitable for most cases.
Setting it to 1 prevents any reordering of the internal JOINs. Therefore, the explicit join order specified in the query is the actual order in which the relationship is connected.
Setting this variable to a value between 1 and from_collapse_limit may help trade off planning time with the quality of the selected plan (a higher value will result in a better plan).
Setting this value to geqo_threshold or greater may trigger the use of the GEQO planner, resulting in a non-optimal plan.

Data Type	Default Value	Value Range	Set Classification
int	20	1 ~ INT_MAX	master; session; reload

gp_adjust_selectivity_for_outerjoins

Enable the selection of NULL testing on external connections.

Data Type	Default	Set Category
boolean	on	master; session; reload

gp_hashjoin_tuples_per_bucket

Sets the target density of the hash table used by the HashJoin operation.

Smaller values tend to produce larger hash tables, which can improve join performance.

Data Type	Default Value	Value Range	Set Classification
int	5	1 ~ 25	master; session; reload

gp_workfile_compression

Specifies whether temporary files created when a hash aggregate or hash connection operation overflows to disk.

If your database installation uses a serial ATA (SATA) disk drive, enabling compression may help avoid overloading the disk subsystem using I/O operations.

Data Type	Default	Set Category
boolean	off	master; session; reload

查询计划执行

gp_max_slices

Specifies the maximum number of slices that can be generated by a query (a part of the query plan executed on a Segment instance).

If the number of slices generated by the query exceeds the specified number, the YMatrix database will return an error and not execute the query.
The default value is 0, with no maximum value.
Executing queries that generate a large number of slices may impact database performance. For example, queries containing UNION or UNION ALL operators across multiple complex views may generate a large number of slices. You can run EXPLAIN ANALYZE on the query to view the query's slice statistics.

Data type	Default value	Value range	Setting category
int	0	0 ~ INT_MAX	master；session；reload

plan_cache_mode

Prepared statements (explicitly prepared or implicitly generated, such as PL/pgSQL) can be executed using custom or generic plans.

Custom plans are regenerated for each execution using their specific parameter value sets, while generic plans do not depend on parameter values and can be reused across executions. Therefore, using generic plans can save planning time, but if the optimal plan heavily depends on parameter values, generic plans may be inefficient.
This setting is considered when executing cached plans, not when preparing plans.

Data type	Default value	Value range	Setting category
enum	auto	auto / force_custom_plan / force_generic_plan	master；session；reload

JIT (Just in Time) configuration parameters

jit

Does YMatrix allow the use of JIT compilation?

Data type	Default value	Setting category
boolean	on	master；session；reload

jit_above_cost

Set the query cost threshold for enabling JIT (Just-in-Time) compilation. If the query cost exceeds this value, JIT compilation will be enabled [https://www.postgresql.org/docs/12/jit.html].

Executing JIT consumes some planning time but can accelerate query execution.
Setting this value to -1 disables JIT compilation.

Data type	Default value	Value range	Setting category
floating point	100000	-1 ～ DBL_MAX	master；session；reload

jit_debugging_support

If LLVM has the required functionality, register the generated functions with GDB. This makes debugging easier.

This parameter can only be set when the server starts or when the client connection starts.

Data type	Default value	Setting category
boolean	off	master；session；reload；superuser

jit_dump_bitcode

Write the generated LLVM IR to the file system, in the data_directory.

This is only useful for internal processing of JIT implementations.
Only superusers and users with the appropriate SET permissions can change this setting.

Data type	Default value	Setting category
boolean	off	master；session；reload；superuser

jit_expressions

When JIT compilation is enabled, expressions can be JIT compiled.

Data type	Default value	Setting category
boolean	on	master；session；reload

jit_inline_above_cost

Set the query cost threshold for JIT compilation attempts to inline functions and operators.

If the query cost exceeds this value, JIT compilation will attempt to inline. Inlining increases planning time but can improve execution speed.
Setting this parameter to a value less than jit_above_cost is meaningless.
Setting this parameter to -1 disables inlining.

Data type	Default value	Value range	Setting category
floating point	500000	-1 ～ DBL_MAX	master；session；reload

jit_optimize_above_cost

Set the query cost threshold for applying JIT compilation optimizations.

If the query cost exceeds this value, JIT compilation will apply more expensive optimizations. These optimizations increase planning time but can improve execution speed.
Setting this parameter to a value less than jit_above_cost is meaningless, and setting it to a value greater than jit_inline_above_cost may not be beneficial.
Setting this parameter to -1 disables more expensive optimizations.

Data type	Default value	Value range	Setting category
floating point	500000	-1 ～ DBL_MAX	master；session；reload

jit_profiling_support

If LLVM has the required functionality, it emits the data needed for the perf command to analyze JIT-generated functions.

This will write the file to ~/.debug/jit/. If you have set and loaded the environment variable JITDUMPDIR, it will instead write to JITDUMPDIR/debug/jit.
You are responsible for performing cleanup when necessary.
This parameter can only be set at server startup or when a client connection is initiated.

Data type	Default value	Setting category
boolean	off	master；session；reload；superuser

jit_tuple_deforming

When JIT compilation is enabled, allow JIT to compile tuple transformations.

Data type	Default value	Setting category
boolean	on	master；session；reload

Greenplum ORCA Optimizer Configuration Parameters

gp_enable_relsize_collection

If the table does not have statistics, the ORCA optimizer and Postgres query optimizer are allowed to use the estimated size of the table (pg_relation_size function).

By default, if statistics are unavailable, ORCA and the planner will use default values to estimate the number of rows.
The default behavior can improve query optimization time and reduce resource queue usage in heavy workloads, but may result in suboptimal plans.
For the root partition of a partitioned table, this parameter is ignored.
When ORCA is enabled and the root partition lacks statistics, ORCA always uses the default value. You can use ANALYZE ROOTPARTITION to collect statistics for the root partition. See ANALYZE for details.

Data Type	Default Value	Setting Category
boolean	off	master；session；reload

optimizer

Enable or disable GPORCA when running SQL queries.

Default on. If GPORCA is disabled, the YMatrix database will only use the Postgres query optimizer.
GPORCA coexists with the Postgres query optimizer. When GPORCA is enabled, the YMatrix database will use GPORCA to generate execution plans for queries whenever possible. If GPORCA cannot be used, the Postgres query optimizer will be used instead.
The optimizer parameter can be set for the database system, a single database, a session, or a query.

Data type	Default value	Setting category
boolean	on	master；session；reload

optimizer_analyze_root_partition

For partitioned tables, controls whether the ROOTPARTITION keyword is required when running the ANALYZE command on the table to collect root partition statistics.

The GPORCA optimizer in Greenplum uses root partition statistics when generating query plans. The Postgres query optimizer does not use these statistics.
The default setting for this parameter is on, meaning that the ANALYZE command can collect root partition statistics without using the ROOTPARTITION keyword. Root partition statistics are collected when running ANALYZE on the root partition, or when running ANALYZE on a leaf partition of a partitioned table and other leaf partitions have statistics.
When this value is set to off, you must run ANALYZE ROOTPARTITION to collect root partition statistics.
When the server configuration parameter optimizer is set to on (default), this parameter should also be set to on.
For information on collecting table statistics for partitioned tables, see ANALYZE.

Data type	Default value	Setting category
boolean	on	master；system；restart

optimizer_array_expansion_threshold

When enabling GPORCA (default) and processing queries containing predicates with constant arrays, the optimizer_array_expansion_threshold parameter limits the optimization process based on the number of constants in the array.

If the array in the query predicate contains more than the number of elements specified by the parameter, GPORCA disables the conversion of the predicate to its disjunctive normal form during query optimization. For example, when GPORCA executes a query containing an IN clause with more than 100 elements, GPORCA will not convert the predicate to its disjunctive normal form during query optimization to reduce optimization time and memory consumption. The difference in query processing can be observed in the filter conditions of the IN clause in the query EXPLAIN plan.
Changing the value of this parameter alters the trade-off between shorter optimization time and lower memory consumption, as well as the potential benefits of constraint derivation during query optimization, such as conflict detection and partition elimination.
This parameter can be set for the database system, a single database, or a session or query.

Data type	Default value	Value range	Setting category
int	100	0 ~ INT_MAX	master；session；reload

optimizer_control

Control whether the SET, RESET commands or the YMatrix database utility gpconfig can be used to change the server configuration parameter optimizer.

If the optimizer_control parameter value is enabled, users can set optimizer parameters.
If the optimizer_control parameter value is off, optimizer parameters cannot be changed.

Data Type	Default Value	Setting Category
boolean	on	master；system；restart；superuser

optimizer_cost_model

When GPORCA is enabled (default), this parameter controls the cost model that GPORCA selects for bitmap scans, which are used in conjunction with bitmap indexes or B-tree indexes on AO tables.

legacy - Retains the calibrated cost model used by GPORCA in Greenplum Database versions 6.13 and earlier
calibrated - Improves cost estimation for indexes
experimental - Reserved for future experimental cost models; currently equivalent to the calibrated model
The default cost model calibrated is more likely to choose faster bitmap indexes with nested loop joins rather than hash joins.

Data type	Default value	Value range	Setting category
enum	calibrated	legacy / calibrated / experimental	master；session；reload

optimizer_cte_inlining_bound

When GPORCA (default) is enabled, this parameter controls the amount of inlining performed for common table expression (CTE) queries (queries containing a WHERE clause). - The default value 0 disables inlining.

This parameter can be set for the database system, a single database, or a session or query.

Data type	Default value	Value range	Setting category
int	0	0 ~ INT_MAX	master；session；reload

optimizer_dpe_stats

When the Greenplum ORCA optimizer is enabled (default) and this parameter is set to on (default), GPORCA derives statistics that enable it to more accurately estimate the number of rows to scan during dynamic partition elimination.

You can set this parameter for the database system, a single database, or a session or query.

Data type	Default value	Setting category
boolean	on	master；session；reload

optimizer_enable_associativity

When GPORCA (default setting) is enabled, this parameter controls whether to enable join association transformation during query optimization.

Analyze the join order. For the default value off, only the GPORCA dynamic programming algorithm used to analyze the join order is enabled.
The join association transformation largely replicates the functionality of newer dynamic programming algorithms.
If the value is on, GPORCA can use association transformations during query optimization.
This parameter can be set for the database system, a single database, a session, or a query.

Data type	Default value	Setting category
boolean	off	master；session；reload

optimizer_enable_dml

When GPORCA is enabled (default) and this parameter is set to on (default), GPORCA attempts to execute DML commands such as INSERT, UPDATE, and DELETE.

If GPORCA cannot execute the command, the YMatrix database falls back to the Postgres optimizer.
When set to off, the Greenplum database always falls back to the Postgres optimizer when executing DML commands.
The parameter can be set for the database system, a single database, or a session or query.

Data Type	Default Value	Setting Category
boolean	on	master；session；reload

optimizer_enable_indexonlyscan

When GPORCA is enabled (default) and this parameter is set to on (default), GPORCA can generate an index-only scan plan type for B-tree indexes.

GPORCA only accesses index values and does not access the data blocks of the relation.
This provides improved query execution performance, especially when the table has been emptied, has wide columns, and GPORCA does not need to retrieve any data blocks (e.g., they are visible).
When disabled (off), GPORCA does not generate index-only scan plan types.
The parameter can be set for the database system, a single database, or a session or query.

Data Type	Default Value	Setting Category
boolean	on	master；session；reload

optimizer_enable_multiple_distinct_aggs

When GPORCA is enabled (default), this parameter allows GPORCA to support multiple different qualified aggregates, such as SELECT count(DISTINCT a),sum(DISTINCT b) FROM foo.

By default, this parameter is disabled because its plans are typically not optimal compared to those generated by the Postgres optimizer.
The parameter can be set for the database system, a single database, or a session or query.

Data type	Default value	Setting category
boolean	off	master；session；reload

optimizer_force_agg_skew_avoidance

When GPORCA is enabled (default), this parameter affects the query plan alternatives that GPORCA considers when generating a 3-stage aggregation plan.

When the value is on, by default, GPORCA only considers 3-stage aggregation plans, where intermediate aggregation uses GROUP BY and DISTINCT columns for distribution to reduce the impact of processing skew.
If the value is off, GPORCA can also consider plans that distribute using GROUP BY columns.
These plans may perform poorly when processing skew is present.

Data type	Default value	Setting category
boolean	on	master；session；reload

optimizer_force_multistage_agg

For the default settings, when GPORCA is enabled and this parameter is set to on, GPORCA selects a 3-stage aggregation plan for scalar-different qualified aggregates when generating such plan alternatives.

When the value is off, GPORCA selects based on cost rather than heuristics.
This parameter can be set for the database system, a single database, a session, or a query.

Data type	Default value	Setting category
boolean	off	master；session；reload

optimizer_force_three_stage_scalar_dqa

For the default setting, GPORCA is enabled and this parameter is set to on, GPORCA will select plans with multi-level aggregation when generating such plan alternatives.

When the value is off, GPORCA will select based on cost rather than heuristics.
This parameter can be set for the database system, individual database, session, or query.

Data Type	Default Value	Setting Category
boolean	on	master；session；reload

optimizer_join_arity_for_associativity_commutativity

This value is an optimization hint used to limit the number of join associativity and join commutativity transformations explored during query optimization.

This limit controls the alternative plans considered by GPORCA during query optimization. For example, the default value 18 is an optimization hint that GPORCA stops exploring join associativity and join commutativity transformations when the n-ary join operator has more than 18 child nodes during optimization.
For queries with a large number of joins, specifying a lower value can improve query performance by limiting the number of alternative query plans evaluated by GPORCA.
However, setting the value too low may result in GPORCA generating suboptimal query plans.
This parameter is ineffective when the optimizer_join_order parameter is set to query or greedy.
This parameter can be set for the database system or session.

Data type	Default value	Value range	Setting category
int	18	0 ～ INT_MAX	segments；system；reload

optimizer_join_order

When GPORCA is enabled, this parameter sets the optimization level for join ordering during query optimization by specifying the type of join ordering alternatives to evaluate.

query - Uses the join order specified in the query.
greedy - Calculates the join order specified in the query and alternative schemes based on the minimum cardinality of the joins.
exhaustive - Applies transformation rules to find and evaluate all alternative join orders.
The default value is exhaustive. Setting this parameter to query or greedy can generate suboptimal query plans. However, if the administrator is confident that using the query or greedy setting generates satisfactory plans, query optimization time can be improved by setting the parameter to a lower optimization level.
Setting this parameter to query or greedy overrides the optimizer_join_order_threshold and optimizer_join_arity_for_associativity_commutativity parameters.
This parameter can be set for a single database, session, or query.

Data type	Default value	Value range	Setting category
enum	exhaustive	exhaustive / greedy / query	master；session；reload

optimizer_join_order_threshold

When GPORCA is enabled (default setting), this parameter sets the maximum number of join subgraphs that GPORCA will use based on a dynamically programmed join sorting algorithm.

You can set this value for a single query or for the entire session.
This parameter is invalid when the optimizer_join_query parameter is set to query or greedy.

Data Type	Default Value	Value Range	Set Classification
int	10	0 ~ 12	master; session; reload

optimizer_mdcache_size

Sets the maximum amount of memory (KB) used by GPORCA on the YMatrix database master to cache query metadata (optimized data) during query optimization.

Memory-limited session based.
GPORCA uses default settings to cache query metadata during query optimization: Enable GPORCA and enable optimizer_metadata_caching.
The default value is 16384 (16MB). This is the best value determined by performance analysis.
You can specify values for KB, MB or GB. The default unit is KB. For example, the value 16384 is 16384KB. The value 1GB is the same as 1024MB or 1048576KB.
If the value is 0, the cache size is not limited.
This parameter can be set for a database system, a single database or a session or query.

Data Type	Default Value	Value Range	Set Classification
int	16384	0 ~ INT_MAX	master; session; reload

optimizer_metadata_caching

When GPORCA is enabled (default value), this parameter specifies whether GPORCA caches query metadata (optimized data) in memory on the YMatrix database master during query optimization.

The cache is session-based. At the end of the session, the cache will be released. If the number of query metadata exceeds the cache size, the old unused metadata is evicted from the cache.
If the value is off, GPORCA does not cache metadata during query optimization.
This parameter can be set for a database system, a single database or a session or query.
Server configuration parameter optimizer_mdcache_size controls the size of the query metadata cache.

Data Type	Default	Set Category
boolean	on	master; session; reload

optimizer_parallel_union

When GPORCA is enabled (default value), the optimizer_parallel_union controls the number of parallelizations of queries containing the UNION or UNION ALL clause.

When the value is off, the default GPORCA generates a query plan where each child of the APPEND (UNION) operator is in the same slice as theAPPEND` operator.
During query execution, children execute sequentially.
When this value is turned on, GPORCA generates a query plan where the reassigned Motion node is under the APPEND (UNION) operator. During query execution, the child and the parent APPEND operator are on different slices, allowing children of the `APPEND(UNION) operator to be executed in parallel on the Segment instance.
This parameter can be set for a database system, a single database or a session or query.

Data Type	Default	Set Category
boolean	off	master; session; reload

optimizer_penalize_broadcast_threshold

When GPORCA is enabled (default), GPORCA punishes the cost of attempting to broadcast plans that exceed the value specified by this parameter during query optimization.

For example, if this parameter is set to 100K row (default), any broadcasts over 100K rows will be severely penalized.
When this parameter is set to 0, GPORCA sets this broadcast threshold to unlimited and never punishes broadcast actions.

Data Type	Default Value	Value Range	Set Classification
int	100000	0 ～ INT_MAX	master; session; reload

optimizer_penalize_skew

When GPORCA is enabled (default), this parameter allows GPORCA to penalize the Segment cost of HashJoin, using the skewed Redisttribute Motion as a child node to support Broadcast Motion during query optimization.

When the NDV (number of different values) is less than the number of segments, GPORCA determines that there is an tilt in the redistribution motion.
You can set parameters for a database system, a single database, or a session or query.

Data Type	Default	Set Category
boolean	on	master; session; reload

optimizer_print_optimization_stats***

When GPORCA is enabled (default setting), this parameter can record GPORCA query optimization statistics for various optimization phases of the query.

The default value is off, please do not record optimization statistics.
To record optimization statistics, this parameter must be set to on and the parameter client_min_messages must be set to log.
- set optimizer_print_optimization_stats = on;
- set client_min_messages = 'log';
Record information during query execution or using the EXPLAIN or EXPLAIN ANALYZE command.
This parameter can be set for a database system, a single database or a session or query.

Data Type	Default	Set Category
boolean	off	master; session; reload

optimizer_sort_factor

When GPORCA is enabled (default value), optimizer_sort_factor controls the cost factor applied to the sorting operation during query optimization.

Default value 1 Specifies the default sorting cost factor.
This value is the ratio of the increase or decrease of the default factor. For example, the value 2.0 sets the cost factor to twice the default value and the value 0.5 sets the factor to half the default value.
This parameter can be set for a database system, a single database or a session or query.

| Data Type | Default Value | Value Range | Set Classification | | --- | --- | | floating point | 1.0 | 0.0 ~ DBL_MAX | master; session; reload |

Runtime Filter optimization technical configuration parameters

mx_enable_runtime_filter

Whether to enable Runtime Filter.

Data Type	Default	Set Category
boolean	on	master; session; reload

mx_runtime_join_ratio

When the ratio of the number of external table rows to internal table rows is less than this value, the Runtime Filter is not triggered.

Data Type	Default Value	Value Range	Set Classification
int	10	0 ~ (INT_MAX/2)	master; session; reload

mx_runtime_max_filter_per_query

The maximum number of times Runtime Filter is used in each query.

Data Type	Default Value	Value Range	Set Classification
int	4	0 ~ (INT_MAX/2)	master; session; reload

mx_runtime_max_inner_rows

When the number of rows in the join table is greater than this value, the Runtime Filter is not triggered.

Data Type	Default Value	Value Range	Set Classification
int	2000000	0 ~ (INT_MAX/2)	master; session; reload

mx_runtime_max_intent_level

Runtime Filter is not triggered when the number of rows of the joining table is less than this value.

Data Type	Default Value	Value Range	Set Classification
int	3	0 ~ (INT_MAX/2)	master; session; reload

mx_runtime_min_outer_rows

Runtime Filter is not triggered when the number of rows of the joining table is less than this value.

Data Type	Default Value	Value Range	Set Classification
int	500000	0 ~ (INT_MAX/2)	master; session; reload

mx_runtime_scan_ratio

Runtime Filter is not triggered when the ratio of the number of scanned rows in the join table to the number of rows that meet the join conditions is less than this value. | Data Type | Default Value | Value Range | Set Classification | | --- | --- | --- | --- | | int | 25 | 0 ~ (INT_MAX/2) | master; session; reload |

Vectorized execution engine configuration parameters

matrix.enable_mxvector

Whether to enable vectorized execution engine.

Data Type	Default	Set Category
boolean	true	master; session; reload

matrix.enable_mxv_aggregate

Whether to enable vectorized execution engines to use group aggregation plan.

Data Type	Default	Set Category
boolean	true	master; session; reload

matrix.enable_mxv_hash_aggregate

Whether to enable the vectorized execution engine to use hash aggregation plan.

Data Type	Default	Set Category
boolean	true	master; session; reload

matrix.enable_mxv_append

Whether to enable vectorized execution engines to use append plan types.

Data Type	Default	Set Category
boolean	true	master; session; reload

matrix.enable_mxv_motion

Whether to enable the vectorized execution engine to use the Motion plan type.

Data Type	Default	Set Category
boolean	true	master; session; reload

matrix.enable_mxv_presort_aggregate

Whether to enable vectorized execution engines to use aggregation plan type with explicit pre-sorting.

Data Type	Default	Set Category
boolean	true	master; session; reload

matrix.enable_mxv_root

Whether to enable the root executor of the vectorized execution engine.

Data Type	Default	Set Category
boolean	true	master; session; reload

matrix.enable_mxv_sort

Whether to enable vectorized execution engines to use sorting plan types.

Data Type	Default	Set Category
boolean	true	master; session; reload

matrix.enable_mxv_strict_mode

Whether to enable the vectorized execution engine to use strict mode.

If an expression that cannot be processed by vectorization is encountered, an exception will be directly thrown and the server will no longer process the current query.

Data Type	Default	Set Category
boolean	false	master; session; reload

matrix.enable_mxv_fallback_expression

Whether to enable vectorized execution engines to use fallback expressions.

If this parameter is turned off, an expression that the vectored executor cannot process is encountered, and the entire query returns to the scalar's plan (unveiled node).

Data Type	Default	Set Category
boolean	true	master; session; reload

matrix.enable_mxv_limit

Whether to enable vectorized execution engines to use restricted plan types.

Data Type	Default	Set Category
boolean	true	master; session; reload

matrix.enable_mxv_window

Whether to enable vectorized execution engines to use window aggregation plan type.

Data Type	Default	Set Category
boolean	false	master; session; reload

matrix.enable_mxv_subqueryscan

Whether to enable the vectorized execution engine to use subquery to scan plan types.

Data Type	Default	Set Category
boolean	true	master; session; reload

matrix.enable_mxv_hash_join

Whether to enable vectorized execution engines to use hash connection plan types.

Data Type	Default	Set Category
boolean	true	master; session; reload

matrix.enable_mxv_hash_join_advanced_hashmap

Whether to enable advanced hash graphs for vectorized hash joins.

Data Type	Default	Set Category
boolean	true	master; session; reload

matrix.enable_mxv_aggscan

Whether to enable vectorized execution engines to use the aggregated scan plan type.

Data Type	Default	Set Category
boolean	true	master; session; reload

matrix.enable_mxv_indexscan

Whether to enable vectorized execution engines to use index scanning.

Data Type	Default	Set Category
boolean	true	master; session; reload

简体中文

Query Tuning Category Parameters

Postgres query optimizer operator control parameters

enable_bitmapscan

enable_groupagg

enable_hashagg

enable_hashjoin

enable_indexonlyscan

enable_indexscan

enable_mars2aggscan

enable_material

enable_mergejoin

enable_nestloop

enable_parallel_append

enable_parallel_hash

enable_partition_pruning

enable_partitionwise_aggregate

enable_partitionwise_join

enable_seqscan

enable_sort

enable_tidscan

gp_enable_agg_distinct

gp_enable_agg_distinct_pruning

gp_enable_direct_dispatch

gp_enable_fast_sri

gp_enable_multiphase_agg

gp_enable_preunique

gp_enable_relsize_collection

gp_enable_sort_distinct

gp_enable_sort_limit

gp_selectivity_damping_for_joins

gp_selectivity_damping_for_scans

min_parallel_index_scan_size

min_parallel_table_scan_size

parallel_leader_participation

Postgres query optimizer cost calculation parameters

cpu_index_tuple_cost

cpu_operator_cost

cpu_tuple_cost

cursor_tuple_fraction

effective_cache_size

gp_appendonly_random_page_cost

gp_compressed_random_page_cost

gp_motion_cost_per_row

gp_segments_for_planner

parallel_setup_cost

parallel_tuple_cost

random_page_cost

seq_page_cost

Postgres Genetic Query Optimization Configuration Parameters

geqo

geqo_effort

geqo_generations

geqo_pool_size

geqo_seed

geqo_selection_bias

geqo_threshold

Database statistics sampling parameters

default_statistics_target

Sort operator (Sort) configuration parameters

gp_enable_sort_limit

Aggregation Operator (Agg) Configuration Parameters

gp_enable_agg_distinct

gp_enable_agg_distinct_pruning

gp_enable_multiphase_agg

gp_enable_preunique

gp_workfile_compression

mx_pullup_agg_proj_and_qual

Connection operator (Join) configuration parameters

join_collapse_limit

gp_adjust_selectivity_for_outerjoins

gp_hashjoin_tuples_per_bucket

gp_workfile_compression

Other Postgres query optimizer configuration parameters

constraint_exclusion

force_parallel_mode

from_collapse_limit

gp_enable_predicate_propagation

gp_max_plan_size

gp_statistics_pullup_from_child_partition