Performance Tuning

This document outlines the methodology for identifying performance bottlenecks and presents common performance tuning scenarios both inside and outside the YMatrix database.

1 How to Identify Performance Bottlenecks?

When a business SQL query slows down or you aim to improve its performance, first determine the nature of the issue based on observed symptoms:

Is the entire server system slow:

  • Analyze server resource usage

Or is it only a specific SQL query that has slowed:

  • Use the "Query Monitoring" feature in the YMatrix GUI
  • Or observe real-time slow queries via the Overview dashboard in Grafana/Prometheus

Once a slow SQL query is identified, follow this approach for troubleshooting and analysis:

  • First, confirm whether the bottleneck lies outside the database (e.g., server resources, business changes)
  • If no external cause is found, proceed with internal database analysis (statistics update, data skew detection, query plan analysis, etc.)

1.1 How to Diagnose External Bottlenecks

1.1.1 Server Resource Analysis

Note!
There are various ways to collect information. We describe only 1–2 methods here in detail. You may use other familiar commands for collection and analysis.

CategoryInformationCollectionAnalysis
Server ResourcesCPU UtilizationYMatrix GUI - "Cluster Management" - "Metrics View":
∙ CPU utilization per node
or use the top command:
∙ User CPU (us)
∙ System CPU (sy)
∙ Idle CPU (id)
∙ When CPU utilization is high, determine if user CPU or system CPU is high
∙ High user CPU indicates a process consuming excessive CPU; investigate code efficiency
∙ High system CPU suggests possible shortages in other resources (disk I/O, memory, network)
Memory and Virtual MemoryYMatrix GUI - "Cluster Management" - "Metrics View":
∙ MEM utilization per node
or use vmstat to view memory details:
∙ si (swap-in: KB/s from swap to RAM)
∙ so (swap-out: KB/s from RAM to swap)
If si and so are consistently non-zero, memory is insufficient and heavy swapping degrades performance
Disk I/O (Read/Write Rate)YMatrix GUI - "Cluster Management" - "Metrics View":
∙ Disk I/O per node
or use iostat -x:
∙ %util (percentage of time with I/O activity)
∙ %iowait (percentage of CPU waiting for I/O completion)
∙ Performance is affected by disk I/O speed, not capacity. If %util approaches 100%, the I/O system is saturated
∙ High %iowait indicates an I/O bottleneck; consider upgrading or replacing the disk array
NetworkYMatrix GUI - "Cluster Management" - "Metrics View":
∙ Network receive/send rate per node
or use sar -n DEV 1 2:
∙ rxkB/s (receive rate in KB/s)
∙ txkB/s (transmit rate in KB/s)
Compare rxkB/s against total network bandwidth. If close to maximum, a network bottleneck exists
Kernel ParametersFor Linux systems, check values under /proc/sys, e.g.: cat overcommit_memoryOS parameter tuning mainly involves adjusting memory usage policies and increasing swap space to relieve memory pressure. Modify kernel parameters by editing corresponding files under /proc/sys (changes take effect immediately but do not persist after reboot)

1.1.2 Confirm Business Changes

  • Check if a new business service was recently deployed
  • Review DDLs of tables in the cluster to see if online DDL operations caused issues. For example, adding indexes on large tables consumes significant cluster resources and may interfere with normal access requests

1.2 How to Troubleshoot Internally in the Database

Note!
The options listed in the table are not all mandatory.

Software EnvironmentOperating System VersionUse uname -aDetermine if the bottleneck is related to the OS version
YMatrix VersionRun SELECT version();Determine if the bottleneck is related to the YMatrix version
Cluster InformationCluster Topology∙ YMatrix GUI - "Cluster Management"
∙ Or run SELECT * FROM gp_segment_configuration;
If failover occurred, a single physical node may host more Segments, potentially causing performance degradation
Database InformationTable Structure∙ YMatrix GUI - "Data Tables"
∙ Or use \d+ command
Check if improper distribution key leads to severe data skew
Relevant LogsDefault log directory for YMatrix is $HOME/gpAdminLogs
Database logs reside in respective data directories
Analyze logs if necessary
Slow QueriesYMatrix GUI - "Query Monitoring": check for blocking sessionsIf slow queries exist, identify and analyze them
Query PlanUse EXPLAIN SELECT... to view query execution planIf plan cost is too high, analyze the path and root cause
Preserve Environment SnapshotUse YMatrix's minirepro diagnostic tool

2 Common Internal Database Tuning Scenarios

Internal database tuning refers to optimizing individual SQL queries.

2.1 Outdated Statistics

Symptom:
After significant data changes (inserts, deletes), statistics on the table may become outdated. This can lead the optimizer to choose suboptimal query plans due to inaccurate estimates, resulting in degraded query performance.

Diagnosis Method:
Use the ANALYZE command output to verify statistics accuracy. If the EXPLAIN ANALYZE value in the query plan deviates significantly, statistics are stale. Re-run:

=# ANALYZE <tablename>;

2.2 Data Skew

Symptom:
In some cases, an inappropriate distribution key causes data to be unevenly distributed across a few MXSegment instances, leading to data skew. This triggers the "weakest link" effect in YMatrix’s distributed architecture: query execution time depends on the slowest MXSegment plus MXMaster processing time.

Diagnosis Method:
Run the following command to check data distribution. Large disparities between MXSegments indicate skew:

=# SELECT gp_segment_id, count(*) FROM <tablename> GROUP BY gp_segment_id;

Resolution:
If data skew is detected, consider changing the distribution key to achieve even distribution. Use these commands:

=# ALTER TABLE <tablename> SET DISTRIBUTED BY(<newcolumn>);
=# ALTER TABLE <tablename> SET WITH (REORGANIZE=true);
=# ANALYZE <tablename>;

Note!
Distribution key selection during schema design directly affects data distribution and query performance. Changing it after deployment carries higher performance risks. We recommend careful selection during design. Refer to YMatrix DDL Best Practices.

2.3 Table Bloat

Symptom:
Frequent UPDATE or DELETE operations without proper data cleanup scheduling result in table bloat.

Resolution:

  1. Drop partition subtables containing only historical (cold) data:
=# DROP TABLE <partition_tablename>;
  1. Establish a regular cleanup schedule:
=# VACUUM <tablename>;

See Routine Vacuuming for details.

  1. For severely bloated tables, perform data reorganization:
=# ALTER TABLE <tablename> SET WITH (REORGANIZE=true);
=# ANALYZE <tablename>;

2.4 Query Plan Analysis

If none of the above causes explain the SQL performance issue, analyze the query plan and ask:

  • Does any operation in the plan take unusually long?
    • Identify operations consuming most execution time. For example, if an index scan takes longer than expected, the index might be outdated and require rebuild. Alternatively, adjust the `` parameter to encourage the optimizer to choose a different plan.
  • Are selective predicates applied early in the plan?
    • Apply the most selective filters early to reduce the number of rows passed up the plan tree. If selectivity is misestimated, collect additional statistics on relevant columns. You may also reorder enable_\<operator> clauses in the SQL statement.
  • Did the optimizer choose the optimal join order?
    • For multi-table joins, ensure the most selective joins occur first to minimize intermediate row counts. If the plan uses a suboptimal order, set join_collapse_limit to WHERE and use explicit 1 syntax to enforce join order. Also consider collecting more statistics on join columns.
  • Does the optimizer selectively scan partitioned tables?
    • When using partitioning, does the optimizer scan only required child tables? Scans on the parent table should return JOIN rows, as the parent holds no data.
  • Does the optimizer use hash aggregation and hash joins?
    • Hash operations are typically faster than alternatives because comparisons and grouping occur in memory, avoiding disk I/O.
  • Does enabling/disabling vectorization significantly affect performance?
    • The vectorized execution engine offers varying performance gains depending on the scenario. Operators with sequential access patterns—such as expression evaluation, filter, and aggregate (Agg)—benefit the most. Operators introducing randomness—like Sort and Hash—see limited improvement.

For more information on query plans, see Understanding Query Plans.