This document describes how to correctly and effectively clean up and maintain YMatrix database tables and its basic principles.
YMatrix strongly recommends periodic cleaning and maintenance of database tables in production systems.
Periodic cleaning is implemented by the VACUUM
command, which is mainly used to recycle the storage space occupied by dead tuples. In normal YMatrix operations, deleted or discarded tuples by updating are not physically deleted from their tables, they remain until VACUUM
is finished. Therefore, it is necessary to execute VACUUM
periodically, especially for frequently updated tables.
In most cases, it is enough to have the automatic cleaning daemon (Autovacuum) perform the cleaning, and you may need to adjust the automatic cleaning parameters described in it to get the best results. However, in some cases, you also need to be able to supplement or replace the activities of the background process using manually managed commands such as VACUUM. This is usually performed using the [cron script] (https://en.wikipedia.org/wiki/Cron) or the task scheduler script.
Each table in the YMatrix database must be cleaned and maintained regularly for the following reasons:
UPDATE
or DELETE
of a row at once will not immediately remove the old version of the row. This approach is necessary to benefit from multi-version concurrency control (MVCC): it cannot be deleted when an older version may still be visible to other transactions. But in the end, no transaction will be interested in an outdated or deleted version of the row. The space it takes must be recycled for new rows, which avoids unlimited growth in disk space requirements.Manual cleaning is done by running the VACUUM
SQL statement.
For example, clean a table:
=# VACUUM <tablename>;
Clean all tables:
=# VACUUM;
In addition to the standard VACUUM
statement, you can also use the VACUUM FULL
statement for cleaning and maintenance. VACUUM FULL
can reclaim more disk space but runs more slowly. In addition, the standard form of VACUUM
can be run in parallel with production database operations (commands such as SELECT
, INSERT
, UPDATE
, and DELETE
will continue to work properly, but you cannot use ALTER TABLE
and other commands to update the table definition during the cleaning). But VACUUM FULL
requires an exclusive lock on the table where it works, so it cannot be paralleled with other uses of this table. Therefore, usually administrators should strive to use the standard VACUUM
and avoid VACUUM FULL
.
For different storage tables, the degree of blocking of other SQL statements varies as they execute the VACUUM
clean command:
| Storage Engine | INSERT | UPDATE | DELETE | ALTER | SELECT | Physical File Size | |--|--|--|--|--|--|--|--| | HEAP | No blocking | No blocking | No blocking | Blocking | No blocking | Not changing | | AO | No blocking | No blocking | No blocking | Blocking | No blocking | Shrinking | | MARS2/MARS3 | No blocking | No blocking | No blocking | Blocking | No blocking | Shrinking |
Notes!
VACUUM
generates a large amount of I/O traffic, which will cause poor performance of other active sessions. You can adjust some configuration parameters to perform the performance impact caused by background cleaning activities. See [Cost-based cleaning delay] (/doc/latest/reference/configuration_parameters/resource_consumption_parameters#vacuum).
Automatic cleaning is done through the Autovacuum daemon, with the purpose of automatically executing the VACUUM
and ANALYZE
commands. When enabled:
autovacuum_naptime
.autovacuum_max_workers
parameter, and 3 worker processes are called by default for work.VACUUM
:autovacuum_vacuum_scale_factor * The total number of tuples in the table + autovacuum_vacuum_threshold
.ANALYZE
:autovacuum_analyze_scale_factor * The total number of tuples of this table + autovacuum_analyze_threshold
.Turn on the automatic cleaning function and reload the configuration file to take effect:
$ gpconfig -c autovacuum -v on
$ mxstop -u
Notes!
It is necessary to ensure that the track_counts TO on (SET track_counts TO on;
) is automatically cleaned up the process. This parameter is enabled by default.
Notes!
For details on automatic cleaning related configuration parameters, please refer to [Automatic Cleanup Parameters] (/doc/latest/reference/configuration_parameters/automatic_vacuum_parameters).