YMatrix - Enterprise-level Hyper-Converged Database

What is YMatrix Database?


YMatrix is a hyper-converged database product developed by YMatrix based on the classic PostgreSQL/Greenplum open source database. In addition to excelling in time series scenarios, it also supports classic scenarios such as online transaction processing (OLTP) and online analytical processing (OLAP).
It addresses enterprise needs such as high availability, security, high performance, automated operations and maintenance, and visualized installation and data processing, ensuring the successful implementation of enterprise user requirements.
Its core value lies in its cost-effective usability, high-performance read/write capabilities, high storage efficiency, and high availability.
YMatrix also offers a community version; we welcome your feedback and input.

What are the core features of YMatrix?


YMatrix has the following core features:

  1. Hyper-converged architecture
    YMatrix's hyper-converged architecture solves the problem of “information silos” in traditional databases, enabling “one database for multiple uses.” This is primarily manifested in two aspects: microkernel and MPP (Massively Parallel Processing).
  • Microkernel. In YMatrix, the microkernel primarily includes the storage engine and execution engine. Different microkernels are optimized for different scenarios. For example, the OLTP microkernel (HEAP storage engine + Volcano execution engine) is suitable for TP scenarios, while the time series microkernel (MARS2 storage engine + vectorized execution engine) is suitable for time series scenarios. The storage engine selected by the microkernel is usually fixed, while the executor is determined based on the results of the optimizer's cost evaluation. In summary, you can choose the most suitable plugin combination for different business scenarios to achieve the goal of rapid and flexible database expansion without affecting the stability of the overall core system.
  • Distributed MPP architecture, also known as shared-nothing architecture. This refers to a system with two or more processors that collaborate to execute an operation, where each processor has its own memory, operating system, and disk. YMatrix uses this high-performance system architecture to distribute the database load and use all system resources to process a query in parallel to achieve high performance.
  1. High Performance YMatrix focuses on performance in all scenarios, including write capacity, time series query capacity, OLAP analysis, machine learning performance, and OLTP capacity. The following introduces write and query performance:
  • Writing: Streaming writing tool MatrixGate supports high-speed writing of multiple data types simultaneously. It has high concurrency, distributed, streaming, and batch data writing characteristics, which can exceed expectations in meeting real-time warehousing in enterprise time series scenarios while providing complete transaction guarantees.
  • Query: Supports mixed row-column storage, with the highly compressed storage engine MARS2 as its foundation, using a cost-based optimizer (CBO) to select the most efficient execution plan for you; (Version 5.0 and above) defaults to enabling the vectorized execution engine, which has been rigorously tested by SSB (Star Schema Benchmark) and TSBS (Time Series Benchmark Suite) to ensure you get a query performance experience that far exceeds that of similar products.
  1. High availability
  • Automatic failover: Thanks to YMatrix's (version 5.0 and above) new automatic operations and maintenance mechanism, when the cluster master node (Master) or data node (Segment) fails, it can automatically switch between the primary and backup nodes to complete the failover.
  • Streaming replication: Both Master and Segment can achieve high availability of data through the streaming replication mechanism.
  1. Simple and easy to use
  • Graphical installation: Complete cluster deployment in 10 minutes; simulate time series scenario queries and writes in 3 minutes.
  • Graphical operation and maintenance monitoring: Simple interface, diverse information, and one-click expansion in seconds.
  1. Enterprise-level security YMatrix has a 360-degree security access mechanism, including authentication, permission control, encryption, auditing, and resource control.
  • Authentication: Rich authentication methods. Multiple authentication methods such as trust authentication, password authentication, and PAM certificates.
  • Permission control: Uses a role-based access control mechanism to simplify the association between users and permissions through the role mechanism.
    • Encryption: Provides different levels of encryption. Password storage encryption; encryption for specified fields; SSL host authentication; client-side encryption; encrypted data over the network; cross-network password encryption; database partition encryption.
    • Auditing: Records user login/logout activities and post-login database operations. Sets different audit levels based on security requirements.
    • Resource Control: Enforces strict address access restrictions to ensure trusted user origins; configurable maximum concurrent connection limits; default connection timeout policies.
  1. Comprehensive Ecosystem
  • Fully compatible with the PostgreSQL/Greenplum ecosystem's upstream and downstream toolchain.

What business scenarios does YMatrix support?


  • Scenarios involving complex data processing that require a unified architecture In traditional industrial enterprises, massive amounts of data are often scattered across various departments, business systems, and applications due to organizational strategies, architectural configurations, and digitalization initiatives. These data silos are unable to communicate with one another and remain underutilized, creating one “information silo” after another (https://baike.baidu.com/item/ information silos?fromModule=lemma_search-box). Beyond the high technical challenges, this severely impedes enterprises' ability to gain competitive advantages in their operations. Data silos significantly constrain enterprise management, operations, and development, making them a critical hurdle that must be overcome in digital transformation.
    Currently, YMatrix's hyper-converged architecture has been successfully applied in real-world production scenarios such as factory data foundations, large corporate group data warehouses, intelligent connected vehicles, and IoT device intelligent operations, significantly lowering the technical barriers for enterprises during selection, procurement, use, and maintenance, and has received positive feedback. For example, in smart manufacturing scenarios, a single database can handle the collection, storage, computation, modeling, querying, and analysis of data from enterprise resource planning systems (ERP), manufacturing execution systems (MES), and equipment data.
  • Scenarios with complex time series analysis Time series data is the basic data for the Internet of Things, Internet of Vehicles, Industrial Internet, and Smart Cities. Its core feature is real-time, which requires high database writing and storage capabilities. How to control costs while ensuring performance, how to achieve expansion more safely and quickly to avoid data backlogs, and how to lower the technical threshold to respond more quickly and accurately to new data demands have become issues that enterprises must solve.
    YMatrix is optimized for time. Thanks to the physical sorting, different frequency uploads, batch uploads of data, and MatrixGate's high concurrency, distributed, streaming, and batch data writing capabilities of the MARS2 storage engine, YMatrix can exceed expectations in meeting the needs of real-time warehousing, high-speed writing, real-time querying, and transaction assurance in enterprise time series scenarios.
    YMatrix supports graphical scaling, with simple operations and easy scaling in seconds; it also supports smooth scaling without interrupting business operations, ensuring business security and smoothness, reducing downtime losses, and lowering risks.
  • Massive IoT scenarios with a large number of devices Common IoT scenarios include smart campuses, smart homes, smart transportation, smart water management, smart agriculture, and smart meteorology. A large number of devices means a large amount of data to be written, stored, and queried. Storage costs (compression ratio) and access efficiency (decompression efficiency) are decisive factors in the stability of the data infrastructure in this scenario, while high-speed writing and real-time query performance are important indicators of the end-user experience.
    In addition to petabyte-scale cluster capacity, YMatrix features patented encoding chain compression technology, enabling business personnel to tailor the most suitable encoding scheme for each data column's characteristics, achieving optimal cost-effectiveness and saving enterprises over 50% in storage costs, making massive data storage no longer a burden.
    Thanks to MatrixGate's high-concurrency, distributed, streaming, and batch data writing capabilities, YMatrix can achieve sub-second data ingestion when combined with hardware performance.
    Thanks to full vectorization (version 5.0 and above), YMatrix's SSB performance has been tested to be 1.24 times that of Clickhouse, achieving world-class high throughput and low-latency queries.
  • Traditional data warehouse OLAP scenarios YMatrix is compatible with the PostgreSQL/Greenplum ecosystem and supports classic OLAP scenarios in industries such as finance, telecommunications, government, energy, and manufacturing, as well as business intelligence (BI) and report analysis.
    Such scenarios are more common in non-time series data, using the Hadoop ecosystem to complete data production and consumption: the Hadoop platform stores historical data, and then uses Spark to calculate report indicators, which is a complex process.
    Using YMatrix, you can achieve the data consumption required for this scenario in one stop through features such as the integration of structured and unstructured data types, data federation access, graphical access to Kafka data streams, and hot/cold data separation. It also has automatic failover and automatic recovery mechanisms, making it secure, simple, and easy to use.