This document outlines the methodology for identifying performance bottlenecks and presents common performance tuning scenarios both inside and outside the YMatrix database.
When a business SQL query slows down or you aim to improve its performance, first determine the nature of the issue based on observed symptoms:
Is the entire server system slow:
Or is it only a specific SQL query that has slowed:
Once a slow SQL query is identified, follow this approach for troubleshooting and analysis:
Note!
There are various ways to collect information. We describe only 1–2 methods here in detail. You may use other familiar commands for collection and analysis.
| Category | Information | Collection | Analysis |
| Server Resources | CPU Utilization | YMatrix GUI - "Cluster Management" - "Metrics View": ∙ CPU utilization per node or use the top command: ∙ User CPU (us) ∙ System CPU (sy) ∙ Idle CPU (id) | ∙ When CPU utilization is high, determine if user CPU or system CPU is high ∙ High user CPU indicates a process consuming excessive CPU; investigate code efficiency ∙ High system CPU suggests possible shortages in other resources (disk I/O, memory, network) |
| Memory and Virtual Memory | YMatrix GUI - "Cluster Management" - "Metrics View": ∙ MEM utilization per node or use vmstat to view memory details: ∙ si (swap-in: KB/s from swap to RAM) ∙ so (swap-out: KB/s from RAM to swap) | If si and so are consistently non-zero, memory is insufficient and heavy swapping degrades performance | |
| Disk I/O (Read/Write Rate) | YMatrix GUI - "Cluster Management" - "Metrics View": ∙ Disk I/O per node or use iostat -x: ∙ %util (percentage of time with I/O activity) ∙ %iowait (percentage of CPU waiting for I/O completion) | ∙ Performance is affected by disk I/O speed, not capacity. If %util approaches 100%, the I/O system is saturated ∙ High %iowait indicates an I/O bottleneck; consider upgrading or replacing the disk array |
|
| Network | YMatrix GUI - "Cluster Management" - "Metrics View": ∙ Network receive/send rate per node or use sar -n DEV 1 2: ∙ rxkB/s (receive rate in KB/s) ∙ txkB/s (transmit rate in KB/s) | Compare rxkB/s against total network bandwidth. If close to maximum, a network bottleneck exists | |
| Kernel Parameters | For Linux systems, check values under /proc/sys, e.g.: cat overcommit_memory | OS parameter tuning mainly involves adjusting memory usage policies and increasing swap space to relieve memory pressure. Modify kernel parameters by editing corresponding files under /proc/sys (changes take effect immediately but do not persist after reboot) |
Note!
The options listed in the table are not all mandatory.
| Software Environment | Operating System Version | Use uname -a | Determine if the bottleneck is related to the OS version |
| YMatrix Version | Run SELECT version(); | Determine if the bottleneck is related to the YMatrix version | |
| Cluster Information | Cluster Topology | ∙ YMatrix GUI - "Cluster Management" ∙ Or run SELECT * FROM gp_segment_configuration; | If failover occurred, a single physical node may host more Segments, potentially causing performance degradation |
| Database Information | Table Structure | ∙ YMatrix GUI - "Data Tables" ∙ Or use \d+ command | Check if improper distribution key leads to severe data skew |
| Relevant Logs | Default log directory for YMatrix is $HOME/gpAdminLogs Database logs reside in respective data directories | Analyze logs if necessary | |
| Slow Queries | YMatrix GUI - "Query Monitoring": check for blocking sessions | If slow queries exist, identify and analyze them | |
| Query Plan | Use EXPLAIN SELECT... to view query execution plan | If plan cost is too high, analyze the path and root cause | |
| Preserve Environment Snapshot | Use YMatrix's minirepro diagnostic tool |
Internal database tuning refers to optimizing individual SQL queries.
Symptom:
After significant data changes (inserts, deletes), statistics on the table may become outdated. This can lead the optimizer to choose suboptimal query plans due to inaccurate estimates, resulting in degraded query performance.
Diagnosis Method:
Use the ANALYZE command output to verify statistics accuracy. If the EXPLAIN ANALYZE value in the query plan deviates significantly, statistics are stale. Re-run:
=# ANALYZE <tablename>;
Symptom:
In some cases, an inappropriate distribution key causes data to be unevenly distributed across a few MXSegment instances, leading to data skew. This triggers the "weakest link" effect in YMatrix’s distributed architecture: query execution time depends on the slowest MXSegment plus MXMaster processing time.
Diagnosis Method:
Run the following command to check data distribution. Large disparities between MXSegments indicate skew:
=# SELECT gp_segment_id, count(*) FROM <tablename> GROUP BY gp_segment_id;
Resolution:
If data skew is detected, consider changing the distribution key to achieve even distribution. Use these commands:
=# ALTER TABLE <tablename> SET DISTRIBUTED BY(<newcolumn>);
=# ALTER TABLE <tablename> SET WITH (REORGANIZE=true);
=# ANALYZE <tablename>;
Note!
Distribution key selection during schema design directly affects data distribution and query performance. Changing it after deployment carries higher performance risks. We recommend careful selection during design. Refer to YMatrix DDL Best Practices.
Symptom:
Frequent UPDATE or DELETE operations without proper data cleanup scheduling result in table bloat.
Resolution:
=# DROP TABLE <partition_tablename>;
=# VACUUM <tablename>;
See Routine Vacuuming for details.
=# ALTER TABLE <tablename> SET WITH (REORGANIZE=true);
=# ANALYZE <tablename>;
If none of the above causes explain the SQL performance issue, analyze the query plan and ask:
enable_\<operator> clauses in the SQL statement.WHERE and use explicit 1 syntax to enforce join order. Also consider collecting more statistics on join columns.JOIN rows, as the parent holds no data.For more information on query plans, see Understanding Query Plans.