Monitoring Operational Status

Note!
Disaster recovery capability is available as an experimental feature in YMatrix version 6.0.0.

This section describes how to monitor the operational status of disaster recovery using either the graphical user interface (UI) or command-line tools.

Command-Line Tools

The following section provides a brief guide on checking the status of disaster recovery functionality using SQL queries.

Primary Cluster

  1. System Catalogs

  2. Name of the replication slot used by disaster recovery: internal_disaster_recovery_rep_slot.

  3. Query replication slot information used by disaster recovery

    Note: Must be executed by a user with appropriate privileges on the primary cluster (e.g., mxadmin).

    SELECT *
    FROM pg_catalog.gp_replication_slots
    WHERE slot_name = 'internal_disaster_recovery_rep_slot'
    ORDER BY gp_segment_id
  4. Query replication status for disaster recovery

    Note: Must be executed by a user with appropriate privileges on the primary cluster (e.g., mxadmin).

    SELECT *
    FROM
        pg_catalog.gp_replication_slots s
    LEFT JOIN
        pg_catalog.gp_stat_replication r
    ON
        s.gp_segment_id = r.gp_segment_id AND s.active_pid = r.pid
    WHERE s.slot_name = 'internal_disaster_recovery_rep_slot'
    ORDER BY s.gp_segment_id

Backup Cluster

  1. System Catalog

  2. Name of the replication slot used by disaster recovery: internal_disaster_recovery_rep_slot.

  3. Query WAL receiver status for disaster recovery

    Note: Must be executed by a user with appropriate privileges on the backup cluster (e.g., mxadmin).

    SELECT *
    FROM pg_catalog.gp_stat_wal_receiver
    WHERE slot_name = 'internal_disaster_recovery_rep_slot'
    ORDER BY gp_segment_id

Graphical User Interface

  1. Log in to the UI and navigate to the Cluster page. If no backup cluster is configured for the current cluster, the page displays the following:

    dr_monitor_1

  2. If a backup cluster is configured, two roles exist in the disaster recovery architecture: Primary cluster and Backup cluster.

    a. Primary Cluster
    On the primary cluster page, the system displays three key pieces of information: cluster role, synchronization mode, and synchronization status.

    • Synchronization Mode

      • Three modes are supported: Synchronisation, Asynchronous, and Unidentified.
      • The mode is determined by the synchronous_standby_names parameter in the database configuration:
        • * indicates Synchronisation mode.
        • An empty string indicates Asynchronous mode.
        • Any other value is treated as Unidentified mode.
    • Synchronization Status

      • Three states are possible: Synchronised, Synchronising, and malfunction.
      • This status is derived from the state of synchronous replication streams on the primary cluster:
        • Synchronised: All synchronous replication streams have completed synchronization.
        • Synchronising: At least one active replication stream has not fully synchronized.
        • malfunction: At least one replication stream is inactive.

    b. Backup Cluster

Primary Cluster

  1. Hover over the disaster recovery information panel to reveal the View details button.

  2. Click the View details button to access the synchronous replication stream list page.

    • Each row represents one synchronous replication stream.
      • Status: Current status of the stream, including Disconnected, Synchronising, and Synchronised.
      • contentid: Corresponds to the shard ID of the replication stream.
      • sync_error: Displays error messages related to the replication stream.
      • Update Time: Timestamp when the stream was last updated.
      • Actions: Includes a "Details" option.
    • Use the search box to perform fuzzy searches on contentid and sync_error.
  3. Click the Detail button in any row to view detailed information about that replication stream, including basic info, Slot info, and Replication info.

    • Basic Info: Basic information about the segment associated with the replication stream.
    • Slot Info: Fields from the pg_catalog.gp_replication_slots table, showing data related to physical replication slots.
    • Replication Info: Field data from the pg_catalog.gp_stat_replication table.

Backup Cluster

  1. Hover over the disaster recovery information panel to display the View details button.

  2. Click View details to go to the WAL receiver stream list page on the backup cluster.

    • Each row represents one WAL receiver stream.
      • Status: Process status of the WAL receiver.
      • contentid: Identifies the corresponding shard ID for this WAL receiver.
      • received_lsn: Progress of the received WAL log stream; indicates the latest received log position.
      • received_tli: Timeline ID of the last received WAL log.
      • last_msg_receipt_time: Time when WAL data was received.
      • Actions: Includes a "Details" option.
    • Use the search box to perform fuzzy searches on contentid, received_lsn, received_tli, and last_msg_receipt_time.
  3. Click the details action button to view detailed information for a specific WAL receiver stream.

    • Basic Info: Basic information about the segment associated with the WAL receiver.
    • gp_stat_wal_receiver Info: Field data from the pg_catalog.gp_stat_wal_receiver table.