Write-Ahead Log Category Parameters

This document describes the parameters related to the write-ahead log category in the system configuration parameters.

Notes!
To ensure system stability and security, please be cautious when manually modifying the relevant parameters.

Settings

commit_delay

Before a WAL flush is initiated, commit_delay adds a time delay (in microseconds).

If the system load is high enough that additional transactions are ready to commit within a given interval, allowing more transactions to commit through a single WAL flush can improve group commit throughput. However, it also increases the delay for each WAL flush, up to a maximum of commit_delay. This is because if no other transactions are ready to commit, the delay is wasted. A delay is only executed when there are at least commit_siblings (see below) other active transactions when a flush is about to be initiated.
Additionally, if fsync is disabled, no delay will be executed.
Units are microseconds (μs).

Data type	Default value	Value range	Setting category
int	0	0 ~ 100000	master；session；reload；superuser

commit_siblings

The minimum number of concurrent active transactions required when enforcing the commit_delay delay.

Larger values make it more likely that at least one other transaction will be ready to commit during the delay interval.

Data type	Default value	Value range	Setting category
int	5	0 ~ 1000	master；session；reload

full_page_writes

When this parameter is set to on, the YMatrix server writes the entire contents of each page to the WAL during the first modification of the page after a checkpoint.

This is done because a page write in progress during an operating system crash may only be partially completed, resulting in a disk page containing a mix of new and old data. During recovery after a crash, the row-level change data stored in the WAL is typically insufficient to fully restore such a page.
Storing the complete page image ensures that the page is correctly stored, but at the cost of increasing the amount of data that must be written to the WAL (since WAL replay always starts from a checkpoint, doing this during the first modification of each page after a checkpoint is sufficient). Therefore, one method to reduce the overhead of full-page writes is to increase the checkpoint interval parameter value.
Disabling this parameter accelerates normal operations but may result in unrecoverable data corruption or silent data corruption after a system failure. The risk is similar to disabling fsync but is smaller. It should only be disabled if fsync can be disabled.
Disabling this option does not affect the use of WAL archiving for point-in-time recovery (PITR).

Data type	Default value	Setting category
boolean	on	segments；system；reload

fsync

If this parameter is enabled, the YMatrix server will attempt to ensure that updates are physically written to disk by issuing the fsync() system call or using one of several equivalent methods (see wal_sync_method below). This ensures that the database cluster can be restored to a consistent state after an operating system or hardware crash.

While disabling fsync often provides performance benefits, it may result in irrecoverable data corruption in the event of a power outage or system crash. Therefore, disabling fsync is only recommended when the entire database can be easily reconstructed from external data.
Examples of environments where it is safe to disable fsync include initializing a new database cluster from a backup file, using a database cluster to process a batch of data after the database has been deleted and rebuilt, or a read-only database clone that is frequently rebuilt but not used for failover. High-quality hardware alone is not sufficient reason to disable fsync.
When changing fsync from off to on, all modified buffers in the kernel must be flushed to persistent storage for reliable recovery. This can be done at several points: when the cluster is shut down, when fsync is turned on by running initdb --sync-only, when running sync, when unmounting the filesystem, or when restarting the server.
In many cases, disabling synchronous_commit for less critical transactions can provide significant performance gains without the risk of data corruption.
If you disable this parameter, also consider disabling full_page_writes (see above).

Data type	Default value	Setting category
boolean	on	segments；system；reload

synchronous_commit

Specify the synchronization level. That is, whether a transaction needs to wait for WAL records to be written to disk before the command returns success to the client.

The default and safe setting is on. When set to off, there will be a delay between reporting success to the client and truly ensuring that the transaction is not threatened by a server crash (the maximum delay is three times wal_writer_delay). Unlike fsync, setting this parameter to off does not introduce the risk of database inconsistency: an operating system or database crash may cause some recently committed transactions to be lost, but the database state is the same as if those transactions had been completely aborted. Therefore, when performance is more important than fully ensuring transaction durability, disabling synchronous_commit can serve as an effective alternative.
If synchronous_standby_names is not empty, this parameter also controls transaction commit behavior: whether to wait for their WAL records to be replicated to the Standby server before committing.
When this parameter is set to on, the transaction on the primary server will not commit until the currently synchronized Standby server receives the transaction commit record response and flushes it to disk. This ensures that transactions will not be lost unless both the primary server and all synchronized Standby databases experience data corruption.
When set to remote_apply, the commit will wait until the response from the current synchronized Standby indicates that it has received the transaction commit record and applied the transaction, at which point the transaction will become visible to queries on the Standby.
When this parameter is set to remote_write, the commit will wait until the response from the current synchronous standby indicates that it has received the transaction's commit record and has written it to its operating system. This setting is sufficient to ensure that data is preserved if the Standby server's YMatrix instance crashes, but it does not guarantee that data will be preserved if the Standby server experiences an operating system-level crash, as the data may not necessarily reach stable storage on the Standby machine.
Finally, setting local causes the commit to wait for local flushing to disk rather than replication completion. This is typically not the desired effect when using synchronous replication, but this option is provided for completeness.
If synchronous_standby_names is empty, setting on, remote_apply, remote_write, and local all provide the same synchronization level: transaction commits only wait for local disk flushing.

Parameter Value	Local Persistent Commit	YMatrix Crash Recovery Standby Persistent Commit	Operating System Crash Recovery Standby Persistent Commit	Standby Query Consistency
remote_apply	Y	Y	Y	Y
on	Y	Y	Y
remote_write	Y	Y
local	Y
off

Data type	Default value	Value range	Setting category
enum	on	on / off / true / false / yes / no / 1 / 0 / remote_apply / remote_write / local	segments；session；reload

wal_buffers

The amount of shared memory used for WAL data that has not yet been written to disk (measured in WAL blocks, i.e., XLOG_BLCKSZ bytes). - The default value -1 selects a size equal to 1/32 of shared_buffers (approximately 3%), but not less than 64KB and not greater than the size of the WAL segment (typically 16MB).

If the automatic selection is too large or too small, you can manually set this value, but any positive value less than 32KB will be treated as 32KB.
At each transaction commit, the contents of the WAL buffer are written to disk, so extremely large wal_buffers values are unlikely to provide significant benefits. However, setting this value to several megabytes can improve write performance on a busy server where many clients commit simultaneously.
The automatic adjustment selected by the default setting -1 yields reasonable results in most cases.

数据类型	默认值	取值范围	设置分类
int	-1	-1 ～ INT_MAX/XLOG_BLCKSZ	segments；system；restart

wal_compression

Enable compression operations for full-page writes.

When this parameter is set to on, if full_page_writes is also set to on or during a base backup, the YMatrix server will compress the full page images written to the WAL. The compressed page images will be decompressed during WAL replay.
Enabling this parameter reduces the space occupied by the WAL without incurring the risk of irrecoverable data corruption. However, this comes at the cost of additional CPU overhead for compression during WAL recording and decompression during WAL replay.

Data Type	Default Value	Setting Category
boolean	off	segments；session；reload；superuser

wal_init_zero

When this parameter is set to on, new WAL files are padded with zeros.

In some file systems, this ensures that file space is allocated before we need to write WAL records.
However, Copy-On-Write (COW) file systems may not benefit from this parameter setting, so this parameter skips unnecessary work.
If set to off, only the last bytes are written when the file is created, so it has an expected size.

Data type	Default value	Setting category
boolean	on	segments；session；reload；superuser

wal_level

This parameter determines how much information is written to the WAL.

The default value is replica, which writes enough data to support WAL archiving and replication, including running read-only queries on a standby server. minimal removes all records except those necessary for recovery from a crash or immediate shutdown. Finally, logical adds information required to support logical decoding. Each level includes all information from lower levels.
At the minimal level, WAL logs for certain bulk operations can be safely skipped, making those operations faster. Operations that can benefit from this optimization include: CREATE TABLE AS, CREATE INDEX, CLUSTER, COPY to a table created or truncated in the same transaction.
However, the minimal WAL does not include sufficient information to reconstruct data from the base backup and WAL logs. Therefore, to enable WAL archiving (archive_mode) and streaming replication, the replica level or higher must be used.
At the logical level, the same information as the replica level is recorded, plus additional information required to extract logical change sets from the WAL. Using the logical level will increase WAL capacity, especially if many tables are configured for REPLICA IDENTITY FULL and numerous UPDATE and DELETE statements are executed.

Data Type	Default Value	Value Range	Setting Category
enum	replica	replica / minimal / logical	segments；system；restart

wal_log_hints

Perform full-page write operations for non-critical updates as well.

When this parameter is set to on, the YMatrix server will write the entire contents of the disk page to the WAL after a checkpoint, even during the first modification of the page, including non-critical modifications to the so-called hint bits.
If data validation is enabled, prompt bit updates are always recorded in the WAL, and this setting is ignored. You can use this setting to test how many additional WAL records are generated if your database has data validation enabled.

Data type	Default value	Setting category
boolean	off	segments；system；restart

wal_recycle

Recycle WAL files.

When this parameter is set to on, WAL files are recycled by renaming them to avoid creating new files.
In COW file systems, creating new files may be faster, so this parameter is automatically disabled.

Data type	Default value	Setting category
boolean	on	segments；session；reload；superuser

wal_sync_method

The method used to force WAL updates to disk.

If fsync is disabled, this setting is irrelevant because WAL file updates will not be forced at all. Possible values are: open_datasync (write WAL files using the O_DSYNC option with open()), fdatasync (call fdatasync() on each commit), fsync (calling fsync() at each commit), fsync_writethrough (calling fsync() at each commit, forcing any disk write cache to be flushed), open_sync (writing WAL files using the O_SYNC option with open())
The open_* options can also use O_DIRECT (if available).
Not all of these options are available on all platforms. The default value is the first option supported by the platform in the list. The default value is not necessarily the most ideal; it may be necessary to modify this setting or other aspects of the system configuration to create a crash-safe configuration or achieve optimal performance.

Data type	Default value	Value range	Setting category
enum	fsync / fdatasync（default on Linux）	open_datasync / fdatasync / fsync / fsync_writethrough / open_sync	segments；system；reload

wal_writer_delay

Specifies the frequency with which the WAL writer flushes the WAL, measured in milliseconds (ms).

After flushing the WAL, the writer will sleep for the duration specified by wal_writer_delay (unless awakened prematurely by an asynchronous commit transaction).
If the most recent flush occurred before the wal_writer_delay time and before the value of wal_writer_flush_after WAL was generated, then the WAL will only be written to the operating system and not flushed to disk.
Notes: On many systems, the effective sleep delay granularity is 10 milliseconds. Setting wal_writer_delay to a value that is not a multiple of 10 has the same effect as setting it to the next multiple of 10 greater than that value.

Data type	Default value	Value range	Setting classification
int	200	1 ～ 10000	segments；system；reload

wal_writer_flush_after

Specifies the frequency at which the WAL writer flushes the WAL (in units of WAL blocks, i.e., XLOG_BLCKSZ bytes).

If the most recent flush occurred before wal_writer_delay and less than wal_writer_flush_after WAL values have been generated since then, the WAL will only be written to the operating system and not flushed to disk.
If wal_writer_flush_after is set to 0, WAL data is always flushed immediately.
If no unit is specified, the unit is WAL blocks, i.e., XLOG_BLCKSZ bytes, typically 8kB.

Data type	Default value	Range	Setting category
int	1048576/XLOG_BLCKSZ	0 ~ INT_MAX	segments；system；reload

Checkpoints

checkpoint_completion_target

Checkpoint target duration.

Specify the target for completing the checkpoint as part of the total time between checkpoints.

Data type	Default value	Value range	Setting category
floating point	0.5	0.0 ～ 1.0	segments；system；reload

checkpoint_flush_after

When the amount of data written during a checkpoint exceeds this parameter (in BLOCKS), an attempt is made to force the operating system (OS) to send these writes to the underlying storage.

This limits the amount of dirty data in the kernel page cache, reducing the likelihood of being stuck at the end of the checkpoint when issuing fsync or when the OS is writing back data in bulk in the background.
This often results in significant compression of transaction latency, but in some cases (particularly when the load exceeds shared_buffers but is less than the OS page cache), performance may degrade.
This setting may not be effective on certain platforms.
When this value is 0, forced write-back is disabled.

Data Type	Default Value	Valid Range	Setting Category
int	0	0 ～ 256	segments；system；reload

checkpoint_timeout

The maximum time (in seconds) between automatic WAL checkpoints.

Increasing this parameter value increases the time required for crash recovery.
The valid range is 30 to 86400 (i.e., 60 days).

Data type	Default value	Valid range	Setting category
int	300	30 ～ 86400	segments；system；reload

checkpoint_warning

If the interval between checkpoints caused by filling WAL segment files is less than the time specified by this parameter (in seconds), a message is written to the server log (meaning that the value of max_wal_size should be increased).

Setting this to 0 disables the warning. If checkpoint_timeout is lower than the value of this parameter, no warning will be generated.

Data type	Default value	Valid range	Setting category
int	30	0 ～ INT_MAX	segments；system；reload

max_wal_size

The maximum size (in MB) that the WAL is allowed to grow between automatic WAL checkpoints.

This is a soft limit; in special cases, the WAL size may exceed max_wal_size, such as under heavy load, when archive_command fails, or with a high wal_keep_segments setting.
Increasing this parameter may result in longer crash recovery times.

Data type	Default value	Valid range	Setting category
int	4096	2 ～ INT_MAX/1024	segments；system；reload

min_wal_size

The minimum size (in MB) of the WAL allowed between automatic WAL checkpoints.

As long as the WAL disk usage remains below this setting, old WAL files are always recycled for future use during checkpoints, rather than being deleted directly.
This can be used to ensure that sufficient WAL space is reserved to handle WAL usage peaks, such as when running large batch tasks.

Data type	Default value	Value range	Setting category
int	320	2 ~ INT_MAX/1024	segments；system；reload

Archiving

archive_command

A local shell command used to archive a Segment containing a completed WAL file.

Any %p placeholders in the string must be replaced with the path to the file to be archived, while %f must be replaced with the filename (the path is the cluster's data directory). If you want to embed a real % character in the command, you can use %%. For example: ‘test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f’
Notes: This command only returns 0 as the exit status when successful.
Unless archive_mode is enabled when the server starts, it will be ignored. If archive_mode is enabled and archive_command is an empty string (default), WAL archiving will be temporarily disabled, but the server will continue to accumulate WAL segment files, awaiting a command to be provided.
Setting archive_command to a command that only returns true but does not perform any action (e.g., /bin/true or REM on Windows) effectively disables archiving and breaks the WAL file chain required for archive recovery, so it should only be used in extremely rare cases.

Data Type	Default Value	Setting Category
string		segments；system；reload

archive_mode

When archive_mode is enabled, completed WAL segments can be sent to the archive storage by setting the archive_command command.

In addition to the off setting for disabling, there are two modes: on and always. During normal operation, there is no difference between these two modes, but when set to always, the WAL archiver is also enabled during archive recovery or standby mode. In always mode, all files recovered from the archive or received via streaming replication will be (re)archived.
archive_mode and archive_command are independent variables, allowing archive_command to be modified without affecting the archiving mode.
When wal_level (see above) is set to minimal, archive_mode cannot be enabled.

Data type	Default value	Valid range	Setting category
enum	off	off / on / always	segments；system；restart

archive_timeout

After exceeding this parameter value in seconds, force a switch to a new WAL segment file.

archive_command is only called on completed WAL segments. Therefore, if your server generates very little WAL traffic (or the traffic generation cycle is very long), there will be a long delay between the completion of a transaction and its safe recording to the archive storage. To limit the time that unarchived data exists, you can set archive_timeout to force the server to periodically switch to a new WAL segment file.
When this parameter is set to a value greater than 0, the server will switch to a new segment file as long as the time since the last segment file switch exceeds the value set by this parameter, and there has been any database activity (including a single checkpoint). (if there has been no database activity, the checkpoint will be skipped).
Notes: Archived files that are closed early due to forced switching are still the same length as complete archived files. Therefore, it is not wise to use a very small archive_timeout — it will take up a huge amount of archive storage. An archive_timeout setting of around one minute is usually reasonable.
If you want data to be replicated from the primary server more quickly, you should consider using streaming replication instead of archiving.
The default value is 0, which disables this feature.

Data type	Default value	Valid range	Setting category
int	0	0 ～ INT_MAX/2	segments；system；reload

Archive Recovery

These parameters are only used in recovery mode. If you wish to perform any subsequent recovery operations, you must reset them.

“Recovery” covers using the server as a standby server or for performing target recovery. Typically, standby mode is used to provide high availability and/or read scalability, while target recovery is used to recover from data loss.To start the server in standby mode, create a file named standby.signal in the data directory. The server will enter a recovery state and will not stop the recovery state at the end of the archived WAL, but will try to continue to maintain this state by connecting to the primary_conninfo setting of the specified sending server and/or using restore_command to obtain a new WAL segment. To start the server as the target recovery mode, create a file named recovery.signal in the data directory. If both standby.signal and recovery.signal files are created, alternate mode is preferred. Target recovery mode ends when all archived WALs are played back or reaching recovery_target.

archive_cleanup_command

Specifies the shell commands executed at each restart point.

The purpose of this parameter is to provide a mechanism to clear old archived WAL files that are no longer needed by the Standby server.
Any %r will be replaced with the name of the file containing the last available restart point. This is the earliest file that must be retained to make a recovery rebootable, so all files earlier than %r can be safely removed. To embed a real % character into this command, you need to write %%. This information can be used to truncate the archive to the minimum required to support restart from the current recovery. For a single fallback configuration, the pg_archivecleanup module is often used in archive_cleanup_command, for example: archive_cleanup_command = 'pg_archivecleanup /mnt/server/archivedir %r'.
But Note: If multiple fallback servers are recovering from the same archive directory, you will need to ensure that any server will delete them only if they no longer require WAL files.
If the command returns a non-zero exit status, a warning log message will be displayed. An exception: If the command is terminated by a signal or a shell error (for example, the command is not found), a fatal error will be thrown.

Data Type	Default	Set Category
string		segments; system; reload

recovery_end_command

Specifies the shell command that is executed once when the recovery is completed.

This parameter is optional. The purpose is to provide a mechanism for clearing after replication or recovery.
Similar to archive_cleanup_command, any %r will be replaced with the name of the file containing the last available restart point.
If the command returns a non-zero exit status, a warning log message will be written out and the database will continue to start anyway. An exception: If the command is aborted by a signal or shell error (for example, the command is not found), the database will not continue to start.

Data Type	Default	Set Category
string		segments; system; reload

restore_command

A local shell command to get an archived segment using a WAL file family.

This parameter is required for archive recovery, but is optional for stream replication.
Any %p placeholder in the string will be replaced with the path to the file to be archived, and %f will be replaced with the file name (the path is the data directory of the cluster). If you want to embed a real % character into the command, you can use %%. Any %r will be replaced with the name of the file containing the previous available restart point. For example: 'cp /mnt/server/archivedir/%f %p'.
This file is the earliest of those rebootable recovery files that must be retained, so this information can be used to truncate the archive to the minimum required to support the restart operation from the current recovery.
Note: This command returns an exit status of 0 only when successful. In addition, the command will be asked for the file name that does not currently exist in the archive, and when asked in this way it must return non-zero. For example:
```
restore_command = 'cp /mnt/server/archivedir/%f "%p"'
restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
```
An exception is that if the command is terminated by a signal (not SIGTERM, which is part of the database server shutdown) or a shell error (such as the command not found), the recovery will abort and the server will not start.

Data Type	Default	Set Category
string		segments; system; restart

Recovery Target

This part of the parameters is only set when performing directive recovery operations. By default, recovery will be restored to the end of the WAL log. This section of the parameters can be used to specify an earlier stop point.

Notes!
In recovery_target, recovery_target_lsn, recovery_target_name, recovery_target_time and recovery_target_xid, at most one can be used, and if multiple are used in the configuration file, an error will be generated.

recovery_target

This parameter specifies that the recovery ends as soon as possible after reaching the consistent state.

When recovering from an online backup, this parameter means the point at which the backup ends.
Note: Technically, this is a string parameter, but immediate is currently the only allowed value.

Data Type	Default Value	Value Range	Set Classification
string		immediate	segments; system; restart

recovery_target_action

Specifies the actions the server should take immediately when the recovery target is reached.

The default action is pause, indicating that the recovery will be paused. promote means that the recovery will end and the server will start accepting the connection. shutdown means that the server will be stopped after the recovery target is reached.
The purpose of using pause setting is: if this recovery target is to restore the most desired location, it allows querying to be performed on the database. The paused state can be continued with pg_wal_replay_resume(), which will end the recovery. If this recovery target is not the desired stop point, shut down the server, change the recovery target setting to a later target and restart to continue recovery.
To get the instance ready at the desired replay point, the shutdown setting can come in handy. This instance will still be able to replay more WAL records (and in fact will have to replay from the next time it is started, from the last checkpoint to the current WAL record).
Note: Since recovery_target_action is set to shutdown, recovery.signal will not be removed, and any subsequent startup will end with an immediate shutdown unless the configuration is changed or the recovery.signal file is removed manually.
If the recovery target is not set, this setting has no effect.
If hot_standby is not enabled, the action set by pause will be the same as shutdown.

Data Type	Default Value	Value Range	Set Classification
enum	pause	pause / promote / shutdown	segments; system; restart

recovery_target_inclusive

Specifies whether to stop (on) only after the specified recovery target, or only before the recovery target (off).

Applicable to cases where recovery_target_lsn, recovery_target_time or recovery_target_xid are specified.
This setting controls whether the transaction has an accurate target WAL location (LSN), commit time, or transaction ID will be included in the recovery, respectively.

Data Type	Default	Set Category
boolean	on	segments; system; restart

recovery_target_lsn

This parameter specifies the WAL LSN that will continue to restore.

The exact stop point is affected by recovery_target_inclusive (see above).
Parse this parameter using system data type pg_lsn.

Data Type	Default	Set Category
pg_lsn		segments; system; restart

recovery_target_name

Specifies a named recovery point (created by pg_create_restore_point()) and the recovery will enter the recovery point.

Data Type	Default	Set Category
string		segments; system; restart

recovery_target_time

This parameter specifies the timestamp to which recovery will be executed.

The exact stop point is affected by recovery_target_inclusive (see above).

Data Type	Default	Set Category
timestamp		segments; system; restart

recovery_target_timeline

Specifies to restore to a specific timeline.

This value can be a digital timeline ID or a special value. The value current restores along the same timeline as when performing a basic backup; the value latest restores to the latest timeline found in the archive, which is useful in a backup server.
You usually only need to set this parameter in complex re-recovery situations. In this case you need to return to a state that arrives after a point in time recovery.

Data Type	Default Value	Value Range	Set Classification
string	latest	current / latest / Timeline ID	segments; system; restart

recovery_target_xid

This parameter specifies the transaction ID to be entered by the recovery.

Although the transaction ID is assigned sequentially at the beginning of the transaction, the transaction may be completed in different numerical orders.
Those transactions submitted before (and may also include that transaction) will be resumed.
The exact stop point is affected by recovery_target_inclusive (see above).

Data Type	Default	Set Category
string		segments; system; restart

简体中文