In many enterprises, critical data lives in isolated systems:
When business needs arise—such as analyzing how regional customer orders correlate with manufacturing activity—the traditional response is costly and slow: extract data from each source, run ETL pipelines to load it into a central warehouse, clean and integrate it, then finally generate reports.
But this approach doesn’t scale. Larger datasets mean longer ETL windows. Faster data updates lead to stale insights. And maintaining pipelines across heterogeneous formats becomes a growing operational burden.
This is the classic data silo challenge—and YMatrix’s PXF (Platform Extension Framework) offers a better way.
PXF is YMatrix’s federated query engine. It enables direct, secure, and high-performance SQL access to dozens of external data sources—without copying or moving data. Supported systems include:
With PXF, you can create foreign tables in YMatrix that point to remote data. Once defined, these tables behave like local ones. For example, joining an Oracle customer table with an HDFS log file requires only standard SQL—no custom scripts, no batch windows, no data duplication.
The result? Reports shift from T+1 to real time, storage costs drop, and ETL complexity fades.
An e-commerce company needed to track user behavior from click to purchase. Instead of waiting for nightly ETL jobs to import HDFS logs, they used PXF to query raw logs directly alongside warehouse tables. Analysts now explore conversion paths using live data—no latency, no staging.
An energy firm stores billions of production records in Oracle. Running regional sales aggregations used to require full-table exports—a process that consumed hours and terabytes of network bandwidth. With PXF, YMatrix pushes aggregation logic down to Oracle. Only summarized results are transferred, reducing network traffic from gigabytes to kilobytes—and keeping sensitive schema details private.
PXF’s performance and flexibility stem from three core design principles:
Each YMatrix segment node runs its own PXF instance, reading external data shards (like HDFS blocks or Oracle partitions) in parallel. Query throughput scales linearly with cluster size.
When compatible, PXF pushes WHERE clauses, filters, and projections directly to the source system. This minimizes data movement and leverages native source optimizations.
Built on a modular framework, PXF supports pluggable connectors. New data sources can be added without modifying the core engine—making it future-proof for evolving data landscapes.
As data volumes grow, the “move everything” warehouse model is becoming obsolete. PXF enables a smarter paradigm: leave data where it lives, and bring computation to it.
Whether you’re unifying lake and warehouse workloads, connecting to legacy OLTP systems, or querying cloud storage, PXF lets YMatrix deliver real-time insights with minimal infrastructure overhead.
Stop building pipelines just to move data. Start analyzing it—wherever it is.
Learn more about PXF in YMatrix: https://ymatrix.cn/zh/doc/6.6/dataquery/pxf_hdfs
China Telecom Completes SAP HANA Localization Upgrade
MatrixShift for YMatrix: A Practical Guide to Migrating from Greenplum
How YMatrix Powers SVOLT’s Smart Factory Transformation
How YMatrix Domino Replaces Lambda, Kappa, Flink, and Spark with One Engine🚀
Smart Manufacturing at Scale with YMatrix HTAP: Real-Time Ingestion & Unified Analytics