YMatrix
Cluster Deployment
Data Model
Data Writing
Data Query
SQL Reference
Maintenance and Monitoring
Tool Guide
Troubleshooting
The database is the place where data is stored. After completing data table modeling and storage selection, you must write data to the table. Data writing faces the following challenges:
The biggest feature of time series data is that it has a large amount of data, which includes three aspects in the actual scenario:
In summary, with the huge number of equipment, acquisition indicators and high frequency acquisition density, the amount of data generated is huge, which is a huge challenge to the database throughput. MatrixDB has developed the MatrixGate high-speed writing tool. Through the implementation of data access in parallel by segment nodes, it can reach a write speed of 50 million data points/second.
For implementation, please refer to [MatrixDB - How to implement a stand-alone 50 million data points/second write speed] (https://ymatrix.cn/blog/20210525-MatrixDB-MatrixGate)
For evaluation report, please refer to [MatrixDB - Time Series Database Insertion Performance Evaluation: MatrixDB is 78 times that of InfluxDB] (https://ymatrix.cn/blog/20210524-MatrixDB-insertperformance)
In actual scenarios, the problems faced by data writing are not only large data volume and diverse sources, but also some complex exceptions, such as:
In some scenarios, the device's acquisition indicators at a certain moment will not be sent back in one go, but will be returned in batches. The data returned multiple times need to be merged together, rather than stored in multiple records.
For this scenario, MatrixDB provides UPSERT semantics to merge data based on unique constraints. For specific usage methods, please refer to: MatrixDB - UPSERT for the interpretation of new features of MatrixDB 4.2
Out-of-order and delayed reporting are also supported through upsert.
The so-called heterofrequency reporting refers to the acquisition of different indicators of the device according to different frequencies. For example, some are collected once in 1s, and some are collected once in 2s. As shown in the figure below:
Extrafrequency reporting will cause a large number of NULLs in the index values collected for low-frequency when storing data. As long as NULL columns exist, storage space will also be occupied in MatrixDB storage. For Heap tables, the storage overhead is [Number of Columns/8] bytes; for Mars tables, the storage overhead is [Number of RowGroup/8] bytes. Therefore, the solution should be comprehensively considered based on the situation of NULL.