Vectorized Execution Quick Start

1 Before You Begin

Before using the vectorized execution engine, ensure that your hardware and software environment meet the following requirements:

Hardware/Software Requirement
CPU Intel CPU with Haswell architecture or later
YMatrix Deployed cluster environment

2 Enabling Vectorized Execution

Vectorized execution is enabled by default in YMatrix.

3 Vectorized Execution Examples

First, create a column-oriented table. The following example creates a MARS2 table:

CREATE TABLE test(
  tag int, 
  i4 int4, 
  i8 int8, 
  f4 float4, 
  f8 float8
  ) USING mars2;
CREATE INDEX ON test USING mars2_btree(tag);

3.1 Expression Evaluation

The vectorized execution engine supports common expressions. For example:

EXPLAIN (costs off, verbose) SELECT tag + 10, f4 * -1, f8 / 10 + i8 * i4 FROM test;

This produces the following output:


                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)
   Output: ((tag + 10)), ((f4 * '-1'::double precision)), (((f8 / '10'::double precision) + ((i8 * i4))::double precision))
   ->  Custom Scan (MxVScan) on public.test
         Output: (tag + 10), (f4 * '-1'::double precision), ((f8 / '10'::double precision) + ((i8 * i4))::double precision)
 Optimizer: Postgres query optimizer
(5 rows)

MxVScan contains the core computation module. It uses SIMD instructions and cache-friendly algorithms to efficiently process various expressions.

3.2 WHERE Filter Optimization

For filtering queries on column-store tables using WHERE conditions, the vectorized execution engine provides order-of-magnitude performance improvements.

As shown in the following plan, MxVScan also supports vectorized filtering:

EXPLAIN (costs off, verbose) SELECT * FROM test where i4 > 10 and i8 < 8;

Output:

                     QUERY PLAN                     
----------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)
   Output: tag, i4, i8, f4, f8
   ->  Custom Scan (MxVScan) on public.test
         Output: tag, i4, i8, f4, f8
         Filter: ((test.i4 > 10) AND (test.i8 < 8))
 Optimizer: Postgres query optimizer
(6 rows)

3.3 Sorted Query Optimization

The vectorized execution engine implements columnar-optimized sorting algorithms. It also includes special optimizations for the LIMIT operator, which restricts the number of output rows. With vectorization enabled, the LIMIT operator consumes less memory and runs faster.

Example:

EXPLAIN (costs off, verbose) SELECT * FROM test ORDER BY i8;

Output:

                    QUERY PLAN                    
--------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)
   Output: tag, i4, i8, f4, f8
   Merge Key: i8
   ->  Custom Scan (MxVSort) on public.test
         Output: tag, i4, i8, f4, f8
         Sort Key: i8
         ->  Custom Scan (MxVScan) on public.test
               Output: tag, i4, i8, f4, f8
 Optimizer: Postgres query optimizer
(9 rows)

Another example with ORDER BY and LIMIT:

EXPLAIN (costs off, verbose) SELECT * FROM test ORDER BY i4, i8 DESC LIMIT 10;

Output:

                          QUERY PLAN                          
--------------------------------------------------------------
 Limit
   Output: tag, i4, i8, f4, f8
   ->  Gather Motion 1:1  (slice1; segments: 1)
         Output: tag, i4, i8, f4, f8
         Merge Key: i4, i8
         ->  Limit
               Output: tag, i4, i8, f4, f8
               ->  Custom Scan (MxVSort) on public.test
                     Output: tag, i4, i8, f4, f8
                     Sort Key: test.i4, test.i8
                     ->  Custom Scan (MxVScan) on public.test
                           Output: tag, i4, i8, f4, f8
 Optimizer: Postgres query optimizer
(13 rows)

3.4 Aggregation Query Optimization

The vectorized execution engine supports aggregation operators (such as sum, min, max, count) for common data types including int2, int4, int8, float4, and float8.

Example:

EXPLAIN (costs off, verbose) SELECT min(tag + 2), max(i4 + i8), count(f8) FROM test;

Output:

                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Finalize Aggregate
   Output: min((tag + 2)), max((i4 + i8)), count(f8)
   ->  Gather Motion 1:1  (slice1; segments: 1)
         Output: (PARTIAL min((tag + 2))), (PARTIAL max((i4 + i8))), (PARTIAL count(f8))
         ->  Custom Scan (MxVAgg)
               Output: (PARTIAL min((tag + 2))), (PARTIAL max((i4 + i8))), (PARTIAL count(f8))
               ->  Custom Scan (MxVScan) on public.test
                     Output: tag, i4, i8, f8
 Optimizer: Postgres query optimizer
(9 rows)

Grouped aggregation example:

EXPLAIN (costs off, verbose) SELECT count(tag), sum(f4) FROM test GROUP BY i4;

Output:

                            QUERY PLAN                             
-------------------------------------------------------------------
 Finalize GroupAggregate
   Output: count(tag), sum(f4), i4
   Group Key: test.i4
   ->  Gather Motion 1:1  (slice1; segments: 1)
         Output: i4, (PARTIAL count(tag)), (PARTIAL sum(f4))
         Merge Key: i4
         ->  Custom Scan (MxVAgg)
               Output: i4, (PARTIAL count(tag)), (PARTIAL sum(f4))
               ->  Custom Scan (MxVSort) on public.test
                     Output: i4, tag, f4
                     Sort Key: i4
                     ->  Custom Scan (MxVScan) on public.test
                           Output: i4, tag, f4
 Optimizer: Postgres query optimizer
(14 rows)

3.5 Optimization for Complex Queries

The vectorized execution engine also significantly improves performance for more complex queries. For example:

select sum(lo_extendedprice * lo_discount) as revenue
  from :tname
 where lo_orderdate >= '1993-01-01'
   and lo_orderdate <  '1994-01-01'
   and lo_discount between 1 and 3
   and lo_quantity < 25
;