MPP Architecture

This document introduces the database technology architecture used by YMatrix at the physical level: the MPP (Massive Parallel Processing) architecture.

1 What Is MPP Architecture?

MPP refers to a database architecture based on a shared-nothing cluster, where each node has its own independent disk storage and memory system. Business data is distributed across nodes according to the database model and application characteristics. Each data node (Segment Node) connects to others via dedicated or standard commercial networks, collaborating to perform computations and provide database services. Shared-nothing database clusters offer advantages such as scalability, high availability, high performance, and cost-effectiveness.

In simple terms, MPP architecture distributes tasks in parallel across multiple servers and nodes. After each node completes its computation, the results are aggregated to produce the final output.

2 What Does the Shared-Nothing Characteristic of MPP Architecture Mean?

From a data architecture perspective, distributed database architectures are categorized into three types: Shared Everything, Shared Disk, and Shared Nothing:

  • Shared Everything: Typically applies to single-host systems where CPU, memory, and I/O are securely and transparently shared. However, it lacks strong parallel processing capabilities.
  • Shared Disk: A distributed computing architecture where nodes share the same disk devices, but each node has its own private memory. Disks remain accessible across active nodes, allowing memory sharing in case of failures. In this architecture, users can access disks from all cluster nodes. It adapts quickly to changing workloads and employs advanced optimization techniques.
  • Shared Nothing: A distributed computing architecture where each node is independent and interconnected via a network. Each node consists of a processor, main memory, and disk. The primary goal of this architecture is to eliminate resource contention between nodes. Nodes do not share memory or storage; each node has dedicated, non-shared disks, enabling efficient operation in high-volume read/write environments.

3 What Are the Characteristics of MPP Architecture?

Key characteristics of MPP architecture include:

  • No need to write intermediate data to disk.
  • Parallel task execution.
  • Data stored in a distributed and localized manner.
  • Distributed computing with uniform roles across all data nodes (Segments), enhancing parallel processing capability.
  • The stability and availability of the Master directly affect overall system performance.
  • Horizontal scalability, supporting cluster expansion by adding nodes.
  • "Bucket effect": If one node consistently performs slower than others in the cluster, the overall cluster performance is limited by the speed of this slowest node.