Write-Ahead Log (WAL) Category Parameters

This document describes the system configuration parameters in the Write-Ahead Log (WAL) category.

Note!
To ensure system stability and security, manually modifying these parameters should be done with extreme caution.


Settings

commit_delay


Adds a time delay (in microseconds) before a WAL flush is initiated by COMMIT.

  • If the system load is high enough that additional transactions are ready to commit within a given interval, increasing throughput via group commits using a single WAL flush may be beneficial. However, this also increases the latency of each WAL flush by up to commit_delay. A delay is only applied if at least commit_siblings (see below) other active transactions exist when the flush is about to occur; otherwise, the delay is wasted.
  • Additionally, no delay is applied if synchronous_commit is disabled.
  • Unit: microseconds (microsecond, μs).
Data Type Default Value Range Context
int 0 0 ~ 100000 master; session; reload; superuser

commit_siblings


Minimum number of concurrent active transactions required to apply commit_delay.

  • A higher value increases the likelihood that at least one other transaction will be ready to commit during the delay interval.
Data Type Default Value Range Context
int 5 0 ~ 1000 master; session; reload

full_page_writes


When this parameter is on, the YMatrix server writes the entire content of each page to the WAL during its first modification after a checkpoint.

  • This is necessary because during an operating system crash, a page write may be only partially completed, resulting in a mix of old and new data within a single disk page. During recovery, row-level change data stored in the WAL may not be sufficient to fully restore such a corrupted page.
  • Storing full page images ensures correct restoration of pages, at the cost of increased WAL volume. However, since WAL replay always starts from a checkpoint, it is sufficient to do this only during the first modification of each page after a checkpoint. One way to reduce the overhead of full-page writes is to increase the checkpoint interval.
  • Disabling this parameter speeds up normal operation but may lead to unrecoverable or silent data corruption after a system failure. The risk is similar to disabling fsync, though smaller. This setting should only be disabled when fsync can also be safely disabled.
  • Disabling this option does not affect WAL archiving used for Point-in-Time Recovery (PITR).
Data Type Default Value Context
boolean on segments; system; reload

fsync


If enabled, the YMatrix server ensures that updates are physically written to disk by issuing fsync() system calls or equivalent methods (see wal_sync_method). This guarantees the database cluster can recover to a consistent state after an OS or hardware crash.

  • Although disabling fsync often improves performance, it may result in unrecoverable data corruption in the event of power loss or system crash. Therefore, fsync should only be disabled when the entire database can be easily rebuilt from external data.
  • Safe scenarios for disabling fsync include: initial loading of a new database cluster from a backup, using a cluster to process batch data before dropping and recreating it, or maintaining a read-only clone that is frequently rebuilt and not used for failover. High-quality hardware alone is not sufficient justification for disabling fsync.
  • When changing fsync from off to on, all modified kernel buffers must be forced to persistent storage for reliable recovery. This can be achieved when the cluster shuts down, when fsync is enabled via pg_ctl, during sync, unmounting the filesystem, or restarting the server.
  • In many cases, turning off synchronous_commit for less critical transactions can yield significant performance gains without risking data corruption.
  • If you disable this parameter, consider also disabling full_page_writes (see above).
Data Type Default Value Context
boolean on segments; system; reload


synchronous_commit


Controls the level of synchronization. Determines whether a transaction must wait for its WAL record to be written to disk before the command returns a success indicator to the client.

  • The default and safe setting is on. When set to off, there is a delay between reporting success to the client and actually guaranteeing the transaction will survive a server crash (maximum delay is three times wal_writer_delay). Unlike fsync, setting this to off does not risk database inconsistency: a crash may lose some recently committed transactions, but the database state will be as if those transactions were fully aborted. Thus, when performance is more important than full transaction durability, disabling synchronous_commit can be an effective alternative.
  • If synchronous_standby_names is non-empty, this parameter also controls transaction commit behavior: whether to wait until the WAL record is replicated to the standby server before committing.
  • When set to remote_apply, the primary waits until the current synchronous standby acknowledges receipt of the commit record, flushes it to disk, and applies the transaction so it becomes visible to queries on the standby.
  • When set to on, the primary waits until the synchronous standby acknowledges receipt and has flushed the record to disk.
  • When set to remote_write, the primary waits until the synchronous standby acknowledges receipt and has written the record to its OS buffer (but not necessarily flushed to disk).
  • When set to local, the commit waits only for local disk flush, not for replication completion. This is typically not desired when using synchronous replication, but provided for completeness.
  • If synchronous_standby_names is empty, settings remote_apply, remote_write, on, and local all provide the same level: waiting only for local disk flush.
Setting Local Durability Standby Durability after YMatrix Crash Standby Durability after OS Crash Standby Query Visibility
remote_apply Y Y Y Y
on Y Y Y
remote_write Y Y
local Y
off
Data Type Default Value Range Context
enum on on / off / true / false / yes / no / 1 / 0 / remote_apply / remote_write / local segments; session; reload

wal_buffers


Amount of shared memory used for WAL data that has not yet been written to disk (in units of WAL blocks, i.e., XLOG_BLCKSZ bytes).

  • The default value -1 selects a size equal to 1/32 of shared_buffers (about 3%), but not less than 32kB and not more than the WAL segment size (typically 16MB).
  • This value can be manually adjusted if the automatic choice is too large or too small, but any positive value less than 32kB will be treated as 32kB.
  • Contents of the WAL buffer are written to disk on each transaction commit, so very large wal_buffers values offer little benefit. However, setting it to several megabytes may improve write performance on busy servers where many clients commit simultaneously.
  • The default setting -1 enables auto-tuning, which yields reasonable results in most cases.
Data Type Default Value Range Context
int -1 -1 ~ INT_MAX/XLOG_BLCKSZ segments; system; restart

wal_compression


Enables compression of full-page writes in WAL.

  • When on, and if full_page_writes is also on or during a base backup, the YMatrix server compresses full page images written to WAL. Compressed images are decompressed during WAL replay.
  • Enabling this reduces WAL size without risking unrecoverable data corruption. However, it incurs additional CPU overhead during WAL recording and replay.
Data Type Default Value Context
boolean off segments; session; reload; superuser

wal_init_zero


When set to on, new WAL files are initialized with zeros.

  • On some file systems, this ensures that file space is pre-allocated before WAL records need to be written.
  • However, Copy-On-Write (COW) file systems may not benefit from this setting, so the system skips unnecessary work.
  • If set to off, only the last byte is written when the file is created, giving it the expected size.
Data Type Default Value Context
boolean on segments; session; reload; superuser


wal_level


Determines how much information is written to the WAL.

  • The default value replica writes enough data to support WAL archiving and replication, including read-only queries on standby servers. minimal omits all records except those required for crash or immediate shutdown recovery. logical adds information required for logical decoding. Each level includes all information from lower levels.
  • At minimal level, WAL logging for certain bulk operations can be safely skipped, making those operations faster. Applicable operations include: INSERT, UPDATE, DELETE, COPY into tables created or truncated within the same transaction.
  • However, minimal WAL does not contain enough information to reconstruct data from a base backup and WAL logs. Therefore, to enable WAL archiving (archive_mode) and streaming replication, replica or higher must be used.
  • At logical level, in addition to replica-level information, extra data is logged to allow extraction of logical change sets from WAL. Using logical increases WAL volume, especially when many tables are configured for REPLICA IDENTITY FULL and numerous UPDATE and DELETE statements are executed.
Data Type Default Value Range Context
enum replica replica / minimal / logical segments; system; restart

wal_log_hints


Perform full-page writes even for non-critical updates.

  • When on, the YMatrix server writes the entire content of a disk page to WAL during its first modification after a checkpoint, even for non-critical changes such as hint bit updates.
  • If data checksums are enabled, hint bit updates are always logged in WAL, and this setting is ignored. This setting can be used to test how much additional WAL would be generated if checksums were enabled.
Data Type Default Value Context
boolean off segments; system; restart

wal_recycle


Recycle WAL files.

  • When set to on, WAL files are reused by renaming, avoiding creation of new files.
  • On COW file systems, creating new files may be faster, so this option is automatically disabled.
Data Type Default Value Context
boolean on segments; session; reload; superuser

wal_sync_method


Method used to force WAL updates to disk.

  • If fsync is disabled, this setting is irrelevant because WAL file updates will not be forced to disk. Possible values: open_datasync (write WAL file with O_DSYNC), fdatasync (call fdatasync() on each commit), fsync (call fsync() on each commit), fsync_writethrough (call fsync() with write-through directive), open_sync (write WAL file with O_SYNC).
  • The open_* options may also use O_DIRECT (if available).
  • Not all choices are available on every platform. The default is the first method supported by the platform. The default may not be optimal; you may need to adjust this or other system settings to achieve a crash-safe configuration or best performance.
Data Type Default Value Range Context
enum fsync / fdatasync (default on Linux) open_datasync / fdatasync / fsync / fsync_writethrough / open_sync segments; system; reload

wal_writer_delay


Specifies how often the WAL writer flushes WAL, in milliseconds.

  • After flushing, the writer sleeps for the time specified by wal_writer_delay, unless awakened early by an asynchronous commit.
  • If the last flush occurred less than wal_writer_delay ago and less than wal_writer_flush_after bytes of WAL have been generated, WAL is written to the OS buffer only, not flushed to disk.
  • Note: On many systems, the effective sleep delay granularity is 10ms. Setting wal_writer_delay to a value not divisible by 10 has the same effect as setting it to the next higher multiple of 10.
Data Type Default Value Range Context
int 200 1 ~ 10000 segments; system; reload

wal_writer_flush_after


Specifies how often the WAL writer flushes WAL, in units of WAL blocks (i.e., XLOG_BLCKSZ bytes).

  • If the last flush occurred less than wal_writer_delay ago and less than wal_writer_flush_after bytes of WAL have been generated, WAL is written to the OS buffer only, not flushed to disk.
  • If wal_writer_flush_after is set to 0, WAL data is always flushed immediately.
  • If no unit is specified, the value is in WAL blocks (XLOG_BLCKSZ bytes), typically 8kB.
Data Type Default Value Range Context
int 1048576/XLOG_BLCKSZ 0 ~ INT_MAX segments; system; reload


Checkpoints

checkpoint_completion_target


Target duration for checkpoint completion.

  • Specifies the target for checkpoint completion as a fraction of the time between checkpoints.
Data Type Default Value Range Context
floating point 0.5 0.0 ~ 1.0 segments; system; reload

checkpoint_flush_after


When writing more than this amount of data during a checkpoint (in BLOCKS), attempt to force the OS to send these writes to underlying storage.

  • This limits the amount of dirty data in the kernel page cache, reducing the chance of being blocked at the end of a checkpoint when fsync is issued or when the OS performs large background writebacks.
  • This often results in significantly reduced transaction latency, though performance may degrade in some cases (especially when load exceeds shared_buffers but is less than OS page cache capacity).
  • This setting may have no effect on some platforms.
  • When set to 0, writeback forcing is disabled.
Data Type Default Value Range Context
int 0 0 ~ 256 segments; system; reload

checkpoint_timeout


Maximum time between automatic WAL checkpoints, in seconds.

  • Increasing this value increases crash recovery time.
  • Range: up to 86400 (i.e., 60 days).
Data Type Default Value Range Context
int 300 30 ~ 86400 segments; system; reload

checkpoint_warning


If checkpoints occur more frequently than this interval (in seconds) due to WAL segment fill, a warning message is written to the server log (indicating that max_wal_size should be increased).

  • Set to 0 to disable warnings. No warning is issued if checkpoint_timeout is less than this value.
Data Type Default Value Range Context
int 30 0 ~ INT_MAX segments; system; reload

max_wal_size


Maximum size (in MB) to which WAL can grow between automatic WAL checkpoints.

  • This is a soft limit; WAL size may exceed max_wal_size under heavy load, checkpoint_timeout failure, or high wal_writer_delay.
  • Increasing this parameter may increase crash recovery time.
Data Type Default Value Range Context
int 4096 2 ~ INT_MAX/1024 segments; system; reload

min_wal_size


Minimum WAL size (in MB) between automatic WAL checkpoints.

  • As long as WAL disk usage stays below this setting, old WAL files are always recycled for future use rather than being deleted.
  • This can be used to ensure sufficient WAL space is reserved for peak usage, such as during large batch jobs.
Data Type Default Value Range Context
int 320 2 ~ INT_MAX/1024 segments; system; reload


Archiving

archive_command


Local shell command to archive a completed WAL segment file.

  • Any occurrence of %p in the string is replaced by the path to the file to be archived, and %f is replaced by the file name (relative to the cluster's data directory). To embed a literal % character, use %%. Example: archive_command = 'cp %p /mnt/server/archivedir/%f'
  • Note: The command must return exit status 0 on success.
  • It is ignored unless the server is started with archive_mode enabled. If archive_mode is enabled and archive_command is an empty string (default), WAL archiving is temporarily disabled, but the server continues to accumulate WAL segment files, waiting for a command to be provided.
  • Setting archive_command to a no-op command that always returns true (e.g., true on Unix or rem on Windows) effectively disables archiving and breaks the WAL chain required for archive recovery, so it should only be used in rare cases.
Data Type Default Value Context
string segments; system; reload

archive_mode


When archive_mode is enabled, completed WAL segments can be sent to archive storage via the archive_command.

  • Besides the disabled state off, two modes are available: on and always. During normal operation, there is no difference between them. However, when set to always, the WAL archiver is also active during archive recovery or standby mode. In always mode, all files restored from archive or received via streaming replication are archived again.
  • archive_mode and archive_command are independent variables, allowing archive_command to be modified without affecting archiving mode.
  • When wal_level (see above) is set to minimal, archive_mode cannot be enabled.
Data Type Default Value Range Context
enum off off / on / always segments; system; restart

archive_timeout


Force switch to a new WAL segment file after this time interval (in seconds).

  • archive_command is only invoked on completed WAL segments. Therefore, if the server generates little WAL traffic (or infrequent bursts), there may be a long delay between transaction completion and safe archival. To limit the lifetime of unarchived data, set archive_timeout to force periodic WAL segment switches.
  • When set to a value greater than 0, the server switches to a new segment file whenever this interval has passed since the last switch and any database activity (including a single checkpoint) has occurred (checkpoints are skipped if no activity).
  • Note: Partially filled archived files are still the same size as full ones. Thus, using a very small archive_timeout is unwise—it consumes excessive archive storage. A setting around one minute is typically reasonable.
  • If faster replication from the primary is desired, consider using streaming replication instead of archiving.
  • Default value is 0, which disables this feature.
Data Type Default Value Range Context
int 0 0 ~ INT_MAX/2 segments; system; reload


Archive Recovery

These parameters apply only in recovery mode. They must be reset if further recovery operations are intended.

"Recovery" includes running the server as a standby (Standby) server or performing point-in-time recovery. Typically, standby mode provides high availability and/or read scalability, while point-in-time recovery is used to recover from data loss.

To start the server in standby mode, create a file named standby.signal in the data directory. The server enters recovery mode and does not stop when archived WAL ends, but attempts to continue by connecting to the sender server specified in primary_conninfo and/or fetching new WAL segments via restore_command.

To start the server in point-in-time recovery mode, create a file named recovery.signal in the data directory. If both standby.signal and recovery.signal are present, standby mode takes precedence. Point-in-time recovery ends when all archived WAL is replayed or when recovery_target is reached.

archive_cleanup_command


Shell command to execute at each restart point.

  • This parameter provides a mechanism to remove old archived WAL files no longer needed by the standby server.
  • Any %r is replaced with the name of the file containing the last available restart point—the earliest file that must be retained for restartable recovery. All files older than %r can be safely removed. To embed a literal %, use %%. This information can be used to truncate the archive to the minimum required for current recovery restart. In single-standby setups, the pg_archivecleanup module is often used in archive_cleanup_command, e.g., archive_cleanup_command = 'pg_archivecleanup /mnt/server/archivedir %r'.
  • However, note: If multiple standbys are recovering from the same archive directory, ensure files are only removed when no server needs them.
  • If the command returns a non-zero exit status, a warning log message is issued. Exception: if the command is terminated by a signal or shell error (e.g., command not found), a fatal error occurs.
Data Type Default Value Context
string segments; system; reload

recovery_end_command


Shell command executed once when recovery finishes.

  • Optional. Provides a mechanism for cleanup after replication or recovery.
  • Similar to archive_cleanup_command, any %r is replaced with the name of the file containing the last available restart point.
  • If the command returns non-zero, a warning is logged, but the database continues to start. Exception: if terminated by a signal or shell error (e.g., command not found), the database will not start.
Data Type Default Value Context
string segments; system; reload

restore_command


Local shell command to retrieve an archived WAL segment from the WAL file series.

  • Required for archive recovery, optional for streaming replication.
  • Any %p is replaced with the path to the file to be restored, %f with the file name (relative to the cluster's data directory). To embed a literal %, use %%. Any %r is replaced with the name of the file containing the last available restart point. Example: restore_command = 'cp /mnt/server/archivedir/%f "%p"'.
  • Among files that must be retained for restartable recovery, this is the earliest one, so this info can be used to minimize the archive.
  • Note: The command must return exit status 0 on success. It will be queried for filenames not present in the archive; in such cases, it must return non-zero. Example:
    restore_command = 'cp /mnt/server/archivedir/%f "%p"'
    restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
  • Exception: If the command is terminated by a signal (except SIGTERM, part of server shutdown) or shell error (e.g., command not found), recovery aborts and the server will not start.
Data Type Default Value Context
string segments; system; restart


Recovery Target

These parameters are used only during targeted recovery operations. By default, recovery proceeds to the end of the WAL log. These parameters allow specifying an earlier stopping point.

Note!
At most one of recovery_target, recovery_target_name, recovery_target_time, recovery_target_xid, and recovery_target_lsn may be used. Using multiple in the configuration file results in an error.

--- SPLIT ---

recovery_target


This parameter specifies that recovery should end as soon as a consistent state is reached.

  • When restoring from an online backup, this parameter means the point at which the backup ended.
  • Note: Technically, this is a string parameter, but immediate is currently the only allowed value.
Data Type Default Value Range Setting Scope
string immediate segments; system; restart

recovery_target_action


Specifies the action the server should take immediately upon reaching the recovery target.

  • The default action is pause, which means recovery will be paused. promote means recovery will end and the server will start accepting connections. shutdown means the server will stop after reaching the recovery target.
  • The pause setting is useful if the recovery target is the desired stopping point, allowing queries to be run on the database. The paused state can be resumed using pg_wal_replay_resume(), which will terminate recovery. If the recovery target is not the desired stopping point, shut down the server, change the recovery target setting to a later point, and restart to continue recovery.
  • The shutdown setting can be helpful to prepare the instance at the desired replay point. The instance will still be able to replay additional WAL records (and will in fact have to replay WAL records from the last checkpoint onward the next time it starts).
  • Note: When recovery_target_action is set to shutdown, the recovery.signal file will not be removed. Any subsequent startup will result in immediate shutdown unless the configuration is changed or the recovery.signal file is manually removed.
  • This setting has no effect if no recovery target is set.
  • If hot_standby is not enabled, the behavior of promote is the same as pause.
Data Type Default Value Range Setting Scope
enum pause pause / promote / shutdown segments; system; restart

recovery_target_inclusive


Specifies whether recovery stops after (on) or before (off) the specified recovery target.

  • Applies when recovery_target_lsn, recovery_target_time, or recovery_target_xid is specified.
  • This setting controls whether transactions with the exact target WAL location (LSN), commit timestamp, or transaction ID are included in the recovery.
Data Type Default Value Setting Scope
boolean on segments; system; restart

recovery_target_lsn


Specifies the WAL LSN up to which recovery will proceed.

  • The exact stopping point is affected by recovery_target_inclusive (see above).
  • This parameter is parsed using the system data type pg_lsn.
Data Type Default Value Setting Scope
pg_lsn segments; system; restart

recovery_target_name


Specifies a named recovery point (created by pg_create_restore_point()) to which recovery will proceed.

Data Type Default Value Setting Scope
string segments; system; restart

recovery_target_time


Specifies the timestamp up to which recovery will proceed.

  • The exact stopping point is affected by recovery_target_inclusive (see above).
Data Type Default Value Setting Scope
timestamp segments; system; restart

recovery_target_timeline


Specifies recovery into a specific timeline.

  • The value can be a numeric timeline ID or a special value. current recovers along the same timeline used during the base backup; latest recovers to the latest timeline found in the archive, which is useful for standby servers.
  • You typically only need to set this parameter in complex re-recovery scenarios, where you need to return to a state that existed after a previous point-in-time recovery.
Data Type Default Value Range Setting Scope
string latest current / latest / Timeline ID segments; system; restart

recovery_target_xid


Specifies the transaction ID at which recovery will stop.

  • Although transaction IDs are assigned sequentially at transaction start, transactions may complete in a different order.
  • Transactions that committed before (and optionally including) the specified transaction will be restored.
  • The exact stopping point is affected by recovery_target_inclusive (see above).
Data Type Default Value Setting Scope
string segments; system; restart