Cluster Process Architecture

Core Process Modules

The stable operation of a YMatrix cluster relies on three core process modules, each performing its own responsibilities while working collaboratively to form a complete distributed data processing architecture.

Supervisor Service Process

Known as the "general manager" of cluster processes, it is a resident system service that starts with the installation of the cluster and runs with root privileges. Its core functions include launching and managing all other YMatrix service processes, maintaining persistent process states, ensuring high availability, and exposing RPC interfaces for external management operations. It serves as the foundation for starting and monitoring cluster processes.

SOA Service Processes

Acting as the "scheduling hub" of the cluster, these processes are centered around mxbox series processes, integrating various service capabilities required for distributed management. They handle cluster topology management (cluster module), data sharding scheduling (shard module), data replication and failover (replication module), automated deployment (deployer module), and rely on etcd for storing cluster metadata and ensuring consistency across nodes.

Database Instance Processes

These processes serve as the "processing core" of the cluster, running on Master, Standby, and Segment nodes. They receive and parse SQL requests, execute data storage and computation tasks, record logs, manage data persistence, handle distributed transactions, and support auxiliary processes like logging, storage, and monitoring.

Key Process Descriptions

supervisord

  • Core Position: The "general manager" of cluster processes.
  • Key Features:
    • Runs with root privileges.
    • Starts, monitors, and restarts processes upon failure.
    • Maintains persistent process states for high availability.
    • Configuration path: /etc/matrixdb6/supervisor.conf.
  • Location: Deployed on all nodes (Master, Standby, Segment), with each node running its own instance.

etcd

  • Core Position: The "distributed ledger" for cluster metadata.
  • Key Features:
    • Stores cluster topology, node status, shard allocation information, configuration parameters, etc.
    • Ensures high availability of metadata using the Raft protocol.
    • Network configuration: listens on client communication ports (default 4679) and peer-to-peer communication ports (default 4680).

mxbox Series

  • Core Position: The "functional carrier" of SOA services.

  • Main Modules and Functions | Module | Core Function | |---|---| | mxbox cluster | Manages cluster topology, node joining/leaving, state synchronization, etc. | | mxbox shard | Responsible for data sharding allocation, scheduling, load balancing, mapping shards to Segment nodes. | | mxbox replication | Controls cluster data replication policies, achieving synchronization and failover between Master-Standby and Primary-Mirror. | | mxbox deployer | Provides automated deployment capabilities, supporting node initialization, software installation, configuration synchronization, and other O&M operations. |

  • Key Role: As an "executor" of SOA services, it divides cluster management logic into independent modules, reducing maintenance complexity, and through standardized interfaces collaborates with supervisord and etcd to achieve flexible expansion and efficient O&M of the cluster.

postgres Main Process

  • Core Position: The "core engine" for data processing.

  • Node-Specific Configurations and Functions

    Node Type Process Role (gp_role) Core Configuration Example Core Function
    Master Dispatch (Scheduling) -p 5432 -D /mxdata/master/mxseg-1 Receives client connections, generates distributed query plans, schedules QE execution, aggregates query results.
    Standby Dispatch (Scheduling) -p 5432 -D /mxdata/standby/mxseg-1 Synchronizes Master data, provides failover capability, takes over Master functions when the primary node fails.
    Segment (Primary) Execute (Execution) -p 6000 -D /mxdata/primary/mxseg0 Stores raw data, executes Slice tasks dispatched by QD, sends WAL logs to Mirror.
    Segment (Mirror) Execute (Execution) -p 7001 -D /mxdata/mirror/mxseg6 Synchronizes Primary data, provides data redundancy, switches to Primary node when Primary fails.
  • Derived Process Dependencies: Each postgres main process spawns auxiliary processes such as log handling, storage optimization, transaction assurance, collectively supporting core data processing capabilities, making them the "ultimate executors" of cluster data operations.

QD (Query Dispatcher)

  • Core Position: The "smart scheduler" for queries.
  • Core Functions:
    • Query Parsing and Optimization: Receives client SQL requests, completes syntax parsing, semantic analysis, and generates distributed query plans (dividing into Slices).
    • Task Distribution and Coordination: Distributes Slice tasks to relevant Segment nodes via the libpq protocol, coordinating the execution pace of QE processes.
    • Status Control and Result Aggregation: Real-time reception of QE's execution status (progress, errors), handles exception scenarios (such as terminating execution when LIMIT is met); aggregates all QE's execution results, organizes them, and returns them to the client.

QE (Query Executor)

  • Core Position: The "frontline executor" for queries.

  • Core Functions:

    • Task Execution: Receives Slice tasks dispatched by QD, performs specific operations such as data filtering, aggregation, joins, sorting.
    • Data Interaction: Exchanges data with other QE processes via the interconnect protocol (e.g., cross-node data transmission during Join operations).
    • Status Feedback: Reports real-time execution status and error information to QD, receives control instructions from QD (such as aborting execution).
  • Key Characteristics:

    • Lifecycle: Bound to client sessions; QE processes automatically destroy after session ends to avoid resource occupation.
    • Collaboration Model: Within the same Gang, QE processes work synchronously executing identical Slice logic to ensure data processing consistency.

Global Deadlock Detector

  • Core Position: The "guardian" against deadlocks in distributed transactions.

  • Core Functions:

    • Lock Information Collection: Regularly collects lock information from all Segment nodes (table locks, row locks, etc.).
    • Deadlock Detection: Analyzes lock dependency relationships through directed graph algorithms to identify local and global cross-node deadlocks.
    • Exception Handling: Upon deadlock detection, triggers rollback mechanisms, releases lock resources, and prevents performance bottlenecks in the cluster.
  • Key Significance: As a distributed database, YMatrix frequently encounters cross-node transaction scenarios. This process effectively addresses the challenge of detecting deadlocks in distributed environments, ensuring stability during concurrent transaction executions.

walwriter & walsender & walreceiver

  • Core Position: The "iron triangle" ensuring data consistency.

  • Division of Labor and Collaboration Logic:

    • walwriter: Deployed on Master/Primary nodes, part of the postgres-derived processes, responsible for writing Write-Ahead Logs (WAL) from memory to disk, ensuring local data persistence.
    • walsender: Deployed on Master/Primary nodes, part of the postgres-derived processes, responsible for sending WAL logs in real-time to Standby/Mirror nodes.
    • walreceiver: Deployed on Standby/Mirror nodes, part of the postgres-derived processes, responsible for receiving WAL logs sent by walsender and writing them to local disks.
  • Core Value: Through real-time synchronization of WAL logs, it achieves data redundancy for Standby to Master and Mirror to Primary. In case of a primary node failure, the standby node can quickly recover data through WAL logs, ensuring high availability of the cluster.

Process Tree Structure

YMatrix cluster processes are rooted at supervisord, forming a hierarchical tree structure. Below, key processes' associations and roles are highlighted according to core groups:

Overall Cluster Process Tree Structure

supervisord (root process)
├─ SOA Services Layer
│  ├─ mxbox cluster (Cluster Topology Management)
│  ├─ mxbox shard (Sharding Scheduling Management)
│  ├─ mxbox replication (Data Replication and Failover)
│  ├─ mxbox deployer (Automated Deployment)
│  └─ etcd (Cluster Metadata Storage)
├─ System Management and Monitoring Layer
│  ├─ mxui (Web Management Interface)
│  ├─ cylinder (Periodic Maintenance Tasks)
│  └─ telegraf (Cluster Monitoring Data Collection and Metrics Exposure)
└─ Database Instance Layer (Branching by Node Types)
   ├─ Master Nodes
   │  ├─ postgres (Primary Instance Process, gp_role=dispatch)
   │  │  ├─ Logging and Basic Processes (master logger, mxlogstat, etc.)
   │  │  ├─ Storage Optimization Processes (checkpointer, walwriter, etc.)
   │  │  ├─ Distributed Feature Processes (dtx recovery, global deadlock detector, etc.)
   │  │  └─ Query Scheduling Process (QD: Query Dispatcher)
   │  └─ matrixgate warden (Gateway Connection Management)
   ├─ Standby Nodes
   │  └─ postgres (Backup Instance Process, gp_role=dispatch)
   │     ├─ Basic Assurance Processes (logger, checkpointer, etc.)
   │     └─ Data Synchronization Processes (walreceiver, startup recovering)
   └─ Segment Nodes (Primary/Mirror)
      ├─ postgres (Instance Process, gp_role=execute)
      │  ├─ Basic Assurance Processes (logger, mxlogstat, etc.)
      │  └─ Query Execution Process (QE: Query Executor)
      ├─ Primary Unique Processes (walsender: Sends WAL logs)
      └─ Mirror Unique Processes (walreceiver: Receives WAL logs, startup recovering: Data Recovery)

Other Key Processes

  1. Session Processes: The Linkage Logic Between Master and Data Nodes

Clients establish connections with the Master node via the libpq protocol. QD (Query Dispatcher) links all relevant Segment nodes' QE (Query Executor) processes to form an end-to-end session link, enabling collaborative execution of distributed queries.

Client Session
├─ Master Node: postmaster (listening for connections) → fork child process QD (session-specific scheduler)
│  └─ QD connects to each Segment node's postmaster through the libpq protocol
└─ Segment Nodes: postmaster → fork child process QE (session-specific executor)
   ├─ Bidirectional communication between QD and QE: libpq protocol (control commands/status feedback), interconnect (data exchange)
   └─ Session lifecycle: QD uniformly controls query parsing, plan distribution, result aggregation; QE executes specific data processing tasks
  1. Slice Processes: Core Units for Query Parallelism

Core function: By dividing complex queries into multiple parallel tasks, each executed independently by QE processes on different nodes, significantly enhancing query processing efficiency.

Query Plan
└─ Divided into multiple Slices (independent processing units) based on Motion (data movement operations)
   ├─ Each Slice corresponds to an independent query processing task (e.g., data filtering, aggregation, joins)
   ├─ Physical carrier: A single QE process is responsible for executing one Slice
   └─ Hierarchical relationship: Producer Slice (data output) → Motion operation → Consumer Slice (data consumption)
  1. Gang: Collaborative Process Groups for Distributed Execution
Slice Task
└─ Gang (process group)
   ├─ Composition: All QE processes across different Segment nodes executing the same Slice task
   ├─ Characteristics: QE processes within the same Gang work synchronously executing identical Slice logic
   └─ Role: Ensures consistency in parallel execution of the same query unit under distributed scenarios, achieving cross-node data collaboration processing