Vectorized Execution Best Practices

This section discusses how to efficiently use the vectorized execution engine.

1. Scenarios

The performance improvement of the vectorized execution engine over traditional row-oriented execution varies across different scenarios. Refer to the overview section for a basic understanding.

Operators that support fully sequential processing—such as general expression evaluation, selection, and aggregation—achieve the greatest performance gains under vectorization. In contrast, operators introducing randomness—like Sort and Hash—have limited performance improvement potential.

2. Data Types

2.1 Use Primitive Types Whenever Possible

Use primitive types, such as int32, int64, etc., to maximize vectorization performance.

Fields like enumeration types (e.g., country or Chinese province) are often represented as strings (e.g., "China", "USA", "Beijing", "Xinjiang"). However, preprocessing these into compact integer types like smallint can significantly improve filter operation performance.

2.2 Use the Smallest-Sized Type That Covers the Value Range

If a column like "age" can be represented using smallint, avoid using int32 or int64.

In traditional execution engines, all numeric types are typically stored as 64-bit Datum values in tuples. A Datum represents any SQL data type value but carries no type information. Thus, the choice of numeric type has little impact on processing speed.

However, in the vectorized execution engine, data is stored in tightly packed, contiguous memory layouts. Shorter numeric types are more cache-friendly and allow SIMD registers to hold more elements. Filtering or aggregating a column using int64 can be up to four times slower than using int16.

3. Performance Analysis and Tuning

3.1 Enforce Pure Vectorized Execution

Note!
Due to time and resource constraints, not all standard SQL data types, expressions, and queries are supported by the vectorized engine. To ensure usability, unsupported operations fall back to the traditional execution engine. This may result in little or no performance gain.

You can disable the matrix.enable_mxv_fallback_expression parameter to verify whether a query uses the traditional engine. This parameter is enabled by default.

Parameter Description
matrix.enable_mxv_fallback_expression Enables fallback to the traditional execution engine for expressions not supported by the vectorized engine. Default: on
$ SET matrix.enable_mxv_fallback_expression TO off;

When matrix.enable_mxv_fallback_expression is disabled and you encounter the following error:

$ not implemented yet

it indicates that the query contains data types, expressions, or operations not supported by the current version of the vectorized engine.

3.2 Hash Agg vs. Group Agg

Hash Agg and Group Agg are two implementations of the aggregation operator, each suited to different scenarios. Hash Agg uses a hash table for aggregation, while Group Agg performs aggregation on pre-sorted groups.

Typically, the optimizer selects the appropriate method based on statistics. However, inaccurate statistics or suboptimal planning may lead to non-optimal choices. The following GUCs help analyze and tune aggregation behavior.

Parameter Description
matrix.enable_mxv_aggregate Controls whether Group Agg is enabled. Default: on
matrix.enable_mxv_hash_aggregate Controls whether Hash Agg is enabled. Default: on

These parameters yield four combinations:

  • Both disabled: Use traditional execution engine for aggregation.
  • Both enabled: Optimizer chooses the best path.
  • matrix.enable_mxv_aggregate on, matrix.enable_mxv_hash_aggregate off: Force use of Group Agg.
  • matrix.enable_mxv_hash_aggregate on, matrix.enable_mxv_aggregate off: Force use of Hash Agg.

3.3 Combining Vectorization with MARS2 and MARS3 Storage Engines

3.3.1 Choosing Sort Keys for MARS2 and MARS3

In MARS2 and MARS3, data is stored in sorted order. To maximize performance benefits from data ordering, choose frequently filtered columns with high selectivity as sort keys (i.e., columns involved in sorting order). For example, in a device monitoring table, use event timestamp and device ID as sort keys.

Sort keys in MARS2 and MARS3 can only be defined once at table creation. They cannot be modified, added, or removed.

Differences:

  • MARS2: Sort order is defined by creating an index.
  • MARS3: Sort order is defined by specifying sort columns (multiple allowed) at table creation.
    • For text-type sort keys, if byte-order sorting is acceptable, using COLLATE C can accelerate sorting.
    • SQL keyword for sort key specification: ORDER BY.

Example DDL for MARS2 with sort keys:

=# CREATE TABLE t (
    tag int,
    ts timestamp,
    name text,
    region int
   ) USING MARS2;
=# CREATE INDEX ON t USING mars2_btree (ts, tag);

Example DDL for MARS3 with sort keys:

=# CREATE TABLE t (
    tag int,
    ts timestamp,
    name text,
    region int
    ) USING MARS3
    ORDER BY (ts, tag);

Sort key selection is critical for query performance in both MARS2 and MARS3. Ordered data accelerates scanning, block-level filtering in the storage engine, and improves performance for sorting and aggregation.

For example, if a table’s sort key is (c1,c2), its internal data is ordered by (c1,c2). Consider these two queries involving only sorting:

=# SELECT * FROM t ORDER BY c1, c2;
=# SELECT * FROM t ORDER BY c1, c3;

The first query requires no sorting—the vectorized engine eliminates the Sort operator. For the second, since c1 is already ordered, the engine applies special optimization: it only sorts c3 values within the same c1 group.

For aggregation-only queries:

=# SELECT sum(c1), sum(c2) FROM t GROUP BY c1, c2;
=# SELECT sum(c1), sum(c3) FROM t GROUP BY c1, c3;

If Group Agg is used, performance improves significantly—similar to sorting—because aggregation can proceed during a sequential scan. For Hash Agg, data order has little effect. As sequential operators benefit most from vectorization, Group Agg typically outperforms Hash Agg, especially for the first query.

In summary, analyze your workload and select sort keys based on query patterns (columns used in sorting, aggregation, and filtering).

3.3.2 MARS2 MinMax Index and MARS3 BRIN Index

MARS2 and MARS3 store sorted data in contiguous segments called Runs. Each Run’s metadata includes min/max values, used for filtering during queries. Since filtering uses metadata instead of loading actual data, I/O overhead is reduced compared to full sequential scans.

  • MARS2: Enables minmax filtering by explicitly enabling minmax encoding at table creation. Min/max values are not recorded by default.
  • MARS3: Uses BRIN indexes, created separately, for block-level filtering.

Examples:

MARS2 table with minmax encoding on all columns (can be selective in practice):

=# CREATE TABLE t (
    tag int encoding (minmax),
    ts timestamp encoding (minmax),
    name text encoding (minmax),
    region int encoding (minmax)
    ) USING MARS2;
=# CREATE INDEX ON t USING mars2_btree(tag);

MARS3 table with BRIN index:

=# CREATE INDEX idx_mars3 ON t USING mars3_brin(ts, tag);

Query to retrieve data from a specific region in the current year:

=# SELECT * FROM t WHERE ts > '2022-01-01' AND region = 2;

MinMax (or BRIN) indexes not only enable block-level filtering in the storage engine but also accelerate vectorized computation.

From the storage engine perspective: if the maximum value of column ts in a block is less than 2022-01-01, the block is skipped, saving I/O and reducing computation.

The vectorized engine uses minmax/BRIN metadata to optimize computation. For example, if a block’s region values all equal 2, the engine applies the predicate ts > '2022-01-01' directly without performing any additional region-related operations, improving performance.

MinMax/BRIN indexes and sort keys are independently configured. The former can be applied to any column; the latter typically applies to a few. Their impacts on filtering and sorting overlap but differ. Use both strategically to maximize benefits.

3.3.3 Disabling Index Scans

MARS2 and MARS3 support index scans. However, the current vectorized execution engine only integrates with minmax/BRIN indexes and sequential scans—not index scans. As a result, some queries may not use vectorized execution even when enabled. The execution plan will show Custom Scan (sortheapscan).

To force vectorized execution, disable index scans:

$ SET enable_indexscan TO off;

For more details on MARS2 and MARS3, see Storage Engine.

3.4 Vectorization with ORCA Optimizer

Note!
The ORCA optimizer currently supports only HEAP and AO tables. MARS2/3 are not supported.
When used with vectorization, only AOCO tables are supported.

Enable the ORCA optimizer manually. optimizer is session-scoped; you must re-enable it after reconnecting or starting a new session.

=# SET optimizer TO on;

After enabling optimizer and collecting accurate table statistics, vectorization automatically integrates with ORCA to generate optimized query plans. Example:

=# EXPLAIN SELECT * FROM t;
                QUERY PLAN
-------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=20)
   One-Time Filter: false
 Optimizer: Pivotal Optimizer (GPORCA)
(3 rows)

Note!
For more ORCA parameters, see Greenplum ORCA Optimizer Configuration Parameters.