MPP architecture

Through this document, we will introduce the database technology architecture adopted by YMatrix at the physical level: MPP (Massive Parallel Processing) architecture.

1 What is the MPP architecture?

MPP refers to the fact that in a database non-shared (Shared Nothing) cluster, each node (Node) has an independent disk storage system and memory system. The business data is divided into each node according to the database model and application characteristics. Each data node (Segment Node) is connected to each other through a proprietary network or a commercial general network, and computes with each other to provide database services. Non-shared database clusters have advantages such as scalability, high availability, high performance, and high cost performance.

Simply put, the MPP architecture distributes tasks to multiple servers and nodes in parallel. After the calculation of each node is completed, the results are summarized together to obtain the final result.

2 What does the sharing-free feature of the MPP architecture refer to?

From the perspective of data technology architecture, distributed database architectures are divided into fully shared (Shared Everything), Shared Nothing and Shared Disk:

  • Shared Everything is generally aimed at a single host, and it shares CPU, memory, and I/O securely and transparently, but has insufficient parallel processing capabilities.
  • Shared Disk is an architecture for distributed computing where nodes share the same disk device, but each node has its own private memory. These disks have active nodes that can share memory in the event of any failure. In this architecture, users can access disks from all cluster nodes. The architecture is able to adapt quickly to changing workloads. It uses powerful optimization techniques.
  • Shared Nothing is an architecture for distributed computing where each node is independent and different nodes are interconnected through the network. Each node consists of a processor, main memory, and disk. The main motivation of this architecture is to eliminate competition between nodes. Nodes in this architecture do not share memory or storage, and disks also have separate nodes that cannot be shared to work effectively in large capacity and read and write environments.

![](https://img.ymatrix.cn/ymatrix_home/sharednothing (screenshot)_1692009080.png)

3 What are the characteristics of the MPP architecture?

The characteristics of the MPP architecture are as follows:

  • No need to write intermediate data to disk.
  • Tasks are executed in parallel.
  • Distributed data storage (localization).
  • Distributed computing, all data nodes (Segment) roles are the same. It can improve parallel computing power.
  • The stability of Master, availability affects overall performance.
  • Scale horizontally, supports the expansion of cluster nodes.
  • Barrel effect, if a node always executes slower than other nodes in the cluster, the performance of the entire cluster will be limited by the execution speed of this failed node.