Overview of timing scenario writing

The database is the place where data is stored. After completing data table modeling and storage selection, you must write data to the table. Data writing faces the following challenges:

  1. Large data volume and high throughput requirements
  2. The writing scenarios are complex, such as: out of order, different frequency and other scenarios

1. Large amount of data

The biggest feature of time series data is that it has a large amount of data, which includes three aspects in the actual scenario:

  1. Large number of equipment: the total number of equipment reaches the order of hundreds of thousands to millions and will grow
  2. High acquisition frequency: Indicators require frequency acquisition at second level, and some indicators may require 10ms to be collected once.
  3. Many indicators are collected: Taking the Vehicle Network as an example, each vehicle may contain several thousand indicators.

In summary, with the huge number of equipment, acquisition indicators and high frequency acquisition density, the amount of data generated is huge, which is a huge challenge to the database throughput. MatrixDB has developed the MatrixGate high-speed writing tool. Through the implementation of data access in parallel by segment nodes, it can reach a write speed of 50 million data points/second.

mxgate

For implementation, please refer to [MatrixDB - How to implement a stand-alone 50 million data points/second write speed] (https://ymatrix.cn/blog/20210525-MatrixDB-MatrixGate)

For evaluation report, please refer to [MatrixDB - Time Series Database Insertion Performance Evaluation: MatrixDB is 78 times that of InfluxDB] (https://ymatrix.cn/blog/20210524-MatrixDB-insertperformance)

2. Complex writing scenarios

In actual scenarios, the problems faced by data writing are not only large data volume and diverse sources, but also some complex exceptions, such as:

  1. Automatic merger of batches
  2. Out of order and delay reporting
  3. Different frequency reporting

2.1 Automatic merger of batch reporting

In some scenarios, the device's acquisition indicators at a certain moment will not be sent back in one go, but will be returned in batches. The data returned multiple times need to be merged together, rather than stored in multiple records. upsert

For this scenario, MatrixDB provides UPSERT semantics to merge data based on unique constraints. For specific usage methods, please refer to: MatrixDB - UPSERT for the interpretation of new features of MatrixDB 4.2

2.2 Out of order and delayed reporting

Out-of-order and delayed reporting are also supported through upsert.

2.3 Different frequency reporting

The so-called heterofrequency reporting refers to the acquisition of different indicators of the device according to different frequencies. For example, some are collected once in 1s, and some are collected once in 2s. As shown in the figure below:

upsert

Extrafrequency reporting will cause a large number of NULLs in the index values ​​collected for low-frequency when storing data. As long as NULL columns exist, storage space will also be occupied in MatrixDB storage. For Heap tables, the storage overhead is [Number of Columns/8] bytes; for Mars tables, the storage overhead is [Number of RowGroup/8] bytes. Therefore, the solution should be comprehensively considered based on the situation of NULL.