YMatrix

Over the past year, YMatrix has been involved in multiple manufacturing initiatives—from battery production lines to smartphone factories and EV manufacturing. These industries are often the first to operationalize the “smart factory” vision, and real projects tend to expose the same truth: the hard part is not installing more systems, but building a data foundation that can keep up with production reality.

This article starts from what we repeatedly observe in factory environments and works backward to the core question: what kind of data platform does a modern manufacturing plant actually need?

1. What makes smart factory data so hard?

In advanced manufacturing, most plants already operate with a high degree of digitalization—MES, SCADA, QMS, maintenance systems, BI dashboards, and more. The challenge is that “having systems” does not automatically mean “having decision-ready data.”

In projects, we often hear familiar complaints:

Operations teams: “The data isn’t real-time, so parameter adjustments are always late.”
Executives: “Dashboards don’t support cross-metric analysis—everything is fragmented.”
IT teams: “The architecture is too complex; pipelines break all the time.”

To address these, you have to look at the full lifecycle—from data production to data consumption—and the recurring bottlenecks within factories.

1.1 Data ingestion: it’s not only about write throughput

Performance is the first gate. In one plant (Factory M), equipment count exceeded 1,000, with each machine exposing thousands of collection points. Signals such as pressure, temperature, and current were reported at second-level frequency, producing tens of thousands of records per second. Daily volume can conservatively reach 1.7 billion rows. Sustaining this ingestion rate is table stakes.

But ingestion challenges are rarely “just performance.”

In another plant (Factory F), scaling capacity meant adding lines, replacing equipment generations, and mixing vendors. The result:

different sampling frequencies
different metric counts and field names
inconsistent units and semantics
multiple systems across the full production lifecycle

Without consolidating data from these sources, analysis becomes “partial truth”: each system provides a local view, and global optimization becomes impossible.

1.2 Data movement: fragmented pipelines create delay and fragility

Most production lines come with their own embedded databases, but they tend to be limited in functionality and scalability. If you want plant-wide analytics, you need:

data processing/transformation, and
aggregation and correlation with other data domains.

In Factory L, the typical “device → cloud → reporting” path included multiple layers of ingestion, transport, landing, and compute. With a scattered toolchain, the end-to-end latency in many scenarios drifted to minutes (or longer) before data became usable for analysis.

1.3 Analytics: smart manufacturing increases analytical complexity

As factories become more automated, analysis becomes more central—and more difficult.

Traditional time-series databases often do well for:

ingestion
monitoring
alerting

But factories quickly demand more:

BI dashboards at scale
root-cause tracing across systems
large-volume historical analysis

In our customer base:

Factory F needed a comprehensive tracing system with hundreds of TB of data.
Factory M needed hundreds of real-time dashboards.

As applications grow, the platform must keep query latency low enough to support real decisions. In production, small anomalies matter: if a minor deviation is detected early and corrected, yield improves. Multiply that across hundreds of steps and lines, and the business impact becomes substantial.

2. Why do many existing solutions feel “hard to use” in practice?

A smart factory is not a simple “technology refresh.” It’s a long-term evolution of:

production management philosophy
equipment capabilities
management software practices
infrastructure and platform architecture

Over time, the industry produced specialized components:

time-series databases for high-write ingestion and monitoring
streaming engines (Spark/Flink) for near-real-time processing
BI layers and, increasingly, AI applications (e.g., predictive maintenance)

Yet in real factories, these stacks often look better on diagrams than in operations:

A dedicated time-series store may ingest well, but struggles when dashboards become complex.
Streaming tools provide low latency, but each new feature requires changes across multiple systems, each with different development stacks and operational complexity.
BI tools can visualize, but cross-process analytics collapses when data lives across many disconnected systems.

In short, smart factory modernization cannot rely on one layer improving in isolation. You need an architecture that can connect the whole.

3. Time-Series + Analytics: how YMatrix enables a single foundation

To address these issues, YMatrix focuses on an integrated capability: high-performance time-series ingestion plus high-performance analytics, delivered as one database foundation.

For time-series workloads, YMatrix applies targeted optimizations for write throughput, compression, and usability—competitive with specialized time-series products.
For analytics workloads, YMatrix strengthens query performance and functionality so that operational reporting, tracing, and multi-dimensional analysis can run on the same foundation.

Below are two architectures we see repeatedly in manufacturing—and how the “time-series + analytics” approach can be implemented differently depending on a plant’s constraints.

Architecture 1: A unified “Time-Series + Analytics” one-stack design

This option is suitable when a factory is rapidly expanding device and line scale, and needs a clean, unified ingestion and processing backbone.

What the traditional approach looks like

In many factories, each production line has its own database. Data is synchronized to a central platform, transformed via Spark/Flink, and then served to analytics and applications. This often leads to:

higher latency
difficult maintenance
redundant storage across multiple layers
higher overall cost

The one-stack design

YMatrix redesigns the “data entry + write path” as a unified backbone:

1) Unified data entry

Priority ingestion via MQTT for line devices
Other protocols converted via edge gateways
All traffic unified into RocketMQ as the plant’s message bus

This reduces fragmentation from multi-protocol, multi-vendor lines and avoids “the same data stored in multiple places” just to make systems talk.

2) High-concurrency ingestion

Downstream, MatrixGate consumes RocketMQ in parallel and performs batch, multi-channel writes into the YMatrix warehouse. As lines increase, the system scales horizontally by expanding write channels and nodes.

3) Efficient processing and serving

For scenarios like quality analytics—multiple pass-through steps, rework loops, time-window judgments—YMatrix uses a practical split:

Most statistical metrics and quality rules are computed inside the database using a vectorized execution engine, producing result tables directly for dashboards and trace queries.
Only a smaller subset of deeply layered aggregations remains in external streaming (e.g., Flink) for hierarchical processing and pre-aggregation before landing into ADS.

Real-world outcome (Factory M)

In one smartphone factory implementation:

ingestion reached roughly 140,000 rows/second
common failures like “30-minute write delay” and “>5% data loss” were reduced into a controllable range
“station pass yield” calculations ran in-database at around 475 ms, supporting minute-level dashboard refresh and shop-floor decision needs
PL/Python extensions were used to extend in-database algorithm capabilities for advanced use cases such as predictive maintenance

Architecture 2: A smoother migration path using a streaming data warehouse approach

This option is suitable when the factory must retain an existing stable production database (e.g., SQL Server) and wants modernization without disrupting production systems.

Core idea

Keep the existing production-end database
Sync incremental data into YMatrix on a schedule
Use YMatrix’s built-in streaming compute engine (Domino) to complete transformations inside the database
Serve results directly to downstream applications with YMatrix’s analytics performance

Observed benefits (Factory L)

Based on early PoC and partial rollout, this approach delivered:

1) Simplified end-to-end flow Instead of multiple intermediate databases and components (Kafka, Flink, Spark, etc.), the pipeline consolidates inside YMatrix. End-to-end latency dropped from 20+ minutes to single-digit seconds.

2) Real-time responsiveness Domino enables second-level flow from ingestion → storage → DWS-layer processing. Dashboards can refresh key metrics (cycle time, yield, alarms, parameter drift) nearly in real time, shifting BI from “post-event reporting” to “process insight.”

3) Material cost reduction Many factories run heavy cloud-side stacks with distributed components and cross-department chargeback models. Moving the full-chain processing closer to the plant floor can significantly reduce dependency on cloud compute/storage/transfer resources, improving cost predictability and lowering ongoing operational burden.

Conclusion: smart manufacturing is a shift from experience-driven to data-driven

Smart manufacturing is not simply “replacing people with machines.” It is a transformation in decision-making—moving from experience-driven operations to data-driven operations by improving the efficiency and reliability of data flow.

Many factories have already achieved “digital factory” basics with time-series products: monitoring, alarms, and dashboards. But a “digital factory” is not yet a “smart factory.”

To become truly smart, factories must connect line data with other domains, analyze it deeply, and surface business truths fast enough to influence production.

Historically, we stitched together multiple components to cover ingestion, processing, and serving. That can work—but to build a system that scales and stays operable, factories increasingly need an infrastructure upgrade: a foundation that unifies time-series ingestion and analytics performance.

Which link is slowing you down?

In your factory, what is the most constrained part today?

data can’t be written fast enough
pipelines are unstable and hard to operate
dashboards and trace queries are too slow

If you share your device scale, sampling frequency, dashboard count, and current architecture, it becomes much easier to judge: whether you should adopt a one-stack “time-series + analytics” architecture, or a smoother “streaming data warehouse migration” path, and which step delivers the fastest measurable improvement.

Previous：从 Hadoop 生态到 YMatrix 体系，某头部汽车集团换来 10 倍性能提升

Next：银行、证券、保险...金融行业的数据库应该怎么选？

SERES × YMatrix: 3-Hour Migration of 2.13TB, 50% Faster Multi-Scenario Queries

Smart Manufacturing at Scale with YMatrix HTAP: Real-Time Ingestion & Unified Analytics

How MARS3 Works: Hybrid Row/Column Storage for High-Frequency Writes and Fast Analytics

From Greenplum to YMatrix: Migrating Core Business Data for a Leading Power-Battery Manufacturer

How YMatrix Domino Replaces Lambda, Kappa, Flink, and Spark with One Engine🚀