Routine Cleanup

This document explains how to properly and effectively perform periodic cleanup and maintenance of tables in YMatrix databases, along with the underlying principles.

1 What Is Table Cleanup and Maintenance in YMatrix?

YMatrix strongly recommends performing periodic table cleanup and maintenance on production systems.

Periodic cleanup is achieved using the VACUUM command, primarily to reclaim storage space occupied by dead tuples. In normal YMatrix operations, rows that are updated or deleted are not physically removed from the table immediately. These obsolete rows remain until a VACUUM operation completes. Therefore, it is essential to run VACUUM regularly, especially for tables that experience frequent updates.

In most cases, relying on the autovacuum daemon to perform cleanup is sufficient. You may need to tune the autovacuum parameters described below for optimal performance. However, in certain scenarios, manual execution of commands such as VACUUM may be required to supplement or replace background process activity. This is typically done using cron scripts or task scheduler scripts.

2 Why Perform Periodic Cleanup and Maintenance?

Regular cleanup and maintenance of every table in a YMatrix database is necessary for the following reasons:

  • Reclaim disk space occupied by updated or deleted rows. In YMatrix, an UPDATE or DELETE operation does not immediately remove the old version of a row. This behavior is required for Multi-Version Concurrency Control (MVCC): old versions must persist as long as they might still be visible to other transactions. Eventually, however, no transaction will require an obsolete or deleted row version. The space it occupies must be reclaimed for reuse by new rows to prevent unbounded growth in disk space usage.
  • Update optimizer statistics, which are used by the query planner.
  • Update the visibility map, which enables faster index-only scans.
  • Protect against data loss due to transaction ID wraparound or multitransaction ID wraparound.

3 How to Perform Effective Cleanup and Maintenance?

3.1 Manual Cleanup

Manual cleanup is performed by executing the VACUUM SQL command.

For example, to clean a specific table:

=# VACUUM <tablename>;

To clean all tables:

=# VACUUM;

In addition to the standard VACUUM command, you can use VACUUM FULL. While VACUUM FULL reclaims more disk space, it runs significantly slower. The standard VACUUM can operate concurrently with regular database operations — commands such as INSERT, UPDATE, DELETE, and SELECT continue to function normally. However, during a standard VACUUM, you cannot use commands like ALTER TABLE to modify the table's definition. In contrast, VACUUM FULL requires an exclusive lock on the target table, preventing any concurrent access. Therefore, administrators should generally prefer standard VACUUM and avoid using VACUUM FULL.

The degree of blocking on other SQL statements during VACUUM varies by storage engine:

Storage Engine INSERT UPDATE DELETE ALTER SELECT Physical File Size
HEAP Not blocked Not blocked Not blocked Blocked Not blocked Unchanged
AO Not blocked Not blocked Not blocked Blocked Not blocked Shrinks
MARS2/MARS3 Not blocked Not blocked Not blocked Blocked Not blocked Shrinks

Note!
VACUUM generates significant I/O load, which may degrade performance for other active sessions. You can reduce this impact by tuning cost-based vacuum delay settings. For details, see Cost-Based Vacuum Delay.

3.2 Automatic Cleanup

Automatic cleanup is handled by the Autovacuum daemon, whose purpose is to automatically execute VACUUM and ANALYZE commands. When enabled:

  • Autovacuum monitors tables with heavy insert, update, or delete activity. The interval between checks is defined by the parameter autovacuum_naptime.
  • The autovacuum daemon consists of multiple processes. The number of concurrent worker processes is controlled by the autovacuum_max_workers parameter, which defaults to 3.
  • Autovacuum triggers VACUUM when:
    • The number of dead tuples in a table exceeds the effective threshold calculated by the following formula: autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * number_of_rows.
  • Autovacuum triggers ANALYZE when:
    • The number of inserted, deleted, or updated rows since the last analyze exceeds the effective threshold calculated by: autovacuum_analyze_threshold + autovacuum_analyze_scale_factor * number_of_rows.

Enable autovacuum and reload the configuration to apply changes:

$ gpconfig -c autovacuum -v on
$ mxstop -u

Note!
Autovacuum requires track_counts to be enabled (track_counts = on). This parameter is enabled by default.

Note!
For a complete list of autovacuum-related configuration parameters, refer to Autovacuum Parameters.