SANY Heavy Industry is a manufacturing leader focused on construction engineering equipment. Within SANY, the Intelligentization Division has built an industrial big data platform for concrete machinery (泵诵云平台, hereafter “the platform”), which integrates distributed storage, data modeling and deployment, and visual analytics.
Today, the platform covers more than 20,000 concrete machinery units. By quantifying around 268 metrics across five key performance dimensions—vibration, blockage, height, rotation, and efficiency—it provides comprehensive health check reports and fault prediction for each individual machine, and offers data-driven insights for intelligent operations and maintenance.
At the same time, historical multi-equipment, multi-dimensional statistics and comparative analysis support R&D in product definition, optimization, and tracking. The platform fully underpins four core scenarios: data-driven decision-making, new product tracking, technical retrofit validation, and precise fault diagnosis.
Data management quality and efficiency are strategically important to SANY. By mining industrial big data, the company aims to:
The platform ingests nearly 2,000 operating parameters from more than 20,000 machines. Telemetry is reported at up to 2 Hz, with over 500 million records uploaded every day, consuming more than 1 TB of disk space per day. On top of that, the system performs per-vehicle daily metric calculations, intelligent inspections, model training, and time-series visualization.
In its early phase, the platform was built on a Hadoop + Spark architecture. Over time, four major pain points emerged:
The traditional Hadoop + Spark architecture required a full “big data stack” such as CDH, including:
This resulted in a complex technology stack. To satisfy different service requirements, the same data had to exist in multiple forms across different components, wasting storage and making operations and maintenance increasingly difficult. The diversity of technologies also made it harder to staff and train the team.
Operating data is inherently time-series. In real-world working conditions, various unpredictable factors mean that data is often uploaded irregularly—rows are not aligned “horizontally” across time. The platform acts as a passive receiver of telemetry and can only cleanse data after it lands on disk.
As a result, the cluster contains large amounts of “empty” or sparse data every day. The data cleansing process is complex and error-prone, and it was difficult to maintain high accuracy.
For data analysts, iterative experimentation is essential. They need tools that can return results quickly so they can refine their analysis and algorithms while maintaining “thought continuity”.
In the traditional architecture, Spark—the computation engine—had to read data from HDFS and shuffle or regroup it before computing. This movement and aggregation of data consumed substantial resources and time, significantly slowing down computation and lengthening the analysis cycle.
In industrial scenarios, procedural languages like Python are a must-have for data analysts. Under the traditional architecture, the team used Spark’s pandasUDF feature to batch-run Python code. This led to a lot of “glue code” in the pipelines, which slowed down algorithm development and made it harder to iterate on models.
To address these challenges, SANY rebuilt the underlying architecture of the platform with YMatrix at the core. Compared with the original Spark-based setup, YMatrix brought four major advantages:
YMatrix converges multiple workload types into a single database, covering:
This eliminates the need for a sprawling Hadoop “all-in-one” stack.
YMatrix also offers GUI-based installation and integrates with Grafana for monitoring, which dramatically lowers the operational burden. For the team, this is the “One for ALL” architecture they had been looking for.
For real-time ingestion, the platform uses MatrixGate (mxgate), YMatrix’s streaming data ingestion tool that supports upsert semantics. It can merge and update multiple rows from the same timestamp into a single consolidated record—ideal for the time-series telemetry uploaded by heavy machinery.
At the same time, part of the data cleansing logic can be moved “upstream” into the ingestion process, simplifying the overall data cleaning workflow.
YMatrix is a hybrid OLAP + OLTP database. Since the data is stored directly inside YMatrix, there is no need to move it out for computation. Leveraging YMatrix’s unique MARS table feature, SANY improved end-to-end query and computation performance by around 5x, making it much easier for algorithm engineers to inspect raw data and refine models.
In benchmark comparisons between the two generations of clusters (with the same per-node hardware configuration), the YMatrix hyper-converged database:
Used only half the number of physical machines (about 50% resource savings)
Reduced algorithm runtime from 2.5 hours to 1 hour
YMatrix provides friendly support for writing user-defined functions (UDFs) in Python 3. All interface definitions and invocation metadata are structurally stored in the cluster, making Python code migration, calling, and management much easier.
This is a developer-centric advantage that significantly benefits the team’s subsequent data analysis and structured algorithm iteration.
After the migration to YMatrix, SANY re-examined core workflows across marketing, R&D, and service, and further expanded the way data supports business users and decisions.
Pumping index analytics
The platform analyzes metrics such as utilization rates and pumped concrete volume across regions to assess overall market conditions and customer profitability. It helps evaluate profitability and demand patterns (e.g., metro projects, elevated roads, high-rise buildings) across different levels of the national market, and supports the marketing team in identifying and digging into high-potential focus markets.
Marketing decision support
By analyzing user behavior and performance dimensions of equipment across regions—combined with boom length, chassis type, model, delivery date, and other attributes—the platform helps pinpoint regional equipment needs and enables the marketing team to design more targeted go-to-market strategies.
Retrofit comparison
Based on the progress of technical retrofits, the platform continuously compares equipment performance before and after changes, quantifies key indicators, and visualizes the effect of modifications. This replaces manual phone-based follow-up with online data statistics and analysis, greatly improving both efficiency and reliability.
Fault localization
R&D and service engineers can remotely view (or replay) operating data at the moment a fault occurs, quickly pinpoint root causes, and speed up fault resolution. This has reduced travel for on-site troubleshooting by about 60%.
Product innovation support
By analyzing boom length, chassis type, model, delivery date, and other dimensions, engineers gain a finer-grained understanding of product behavior in the field and can better capture real market needs to guide product innovation.
Multi-level health analysis: national → region → key city → key equipment
The platform scans equipment health status across all regions, clarifies overall performance, and identifies:
It also supports closed-loop tracking of 26 predictive fault patterns and 297 self-diagnostic fault types.
CRM-based service closed loop
By integrating the platform with SANY’s service assistant in a microservices architecture, the team has built a closed loop from monitoring to service execution. This helps improve service efficiency and ultimately reduce customer downtime caused by issues such as pipeline blockage, engine failures, and hydraulic system faults.
Across industries, the demand for massive, real-time data continues to grow. Real-time recommendations, precision marketing, and instant decision-making are becoming core capabilities in digital transformation. The ability to sense and guide user needs more quickly—and improve product experience in real time—creates lasting competitive advantage.
YMatrix’s hyper-converged database is a natural fit for this trend. By providing a unified, one-stop data platform that handles both massive data volumes and real-time analytics, YMatrix makes it easier and more efficient to unlock the value of data and turn it into concrete business outcomes.
How YMatrix Powers SVOLT’s Smart Factory Transformation
Dahshenlin: Achieving Real-Time Finance-Operations Integration with a Modernized Data Foundation
China Telecom Completes SAP HANA Localization Upgrade
How YMatrix Domino Replaces Lambda, Kappa, Flink, and Spark with One Engine🚀
Smart Manufacturing at Scale with YMatrix HTAP: Real-Time Ingestion & Unified Analytics