Data Type

The data model has determined that before creating a table, you also need to know what data types MatrixDB contains. Because a table is composed of columns, each column needs to have a certain type.

The data types supported by MatrixDB include the following categories:

Basic data types
1. Value
2. String
3. Date and time
Extended data types
1. json/jsonb
2. mxkv
Spatial data type

1. Basic data types

1.1 Value

Type	Alias	Size*	Scope	Description
smallint	int2	2 bytes	-32768 ~ 32767	Small range integer
int	int4	4 bytes	-2147483648 ~ 2147483647	Integer Type
bigint	int8	8 bytes	-9223372036854775808 ~ 9223372036854775807	Large range integers
serial	serial4	4 bytes	1 ~ 2147483647	Auto-incremental integer
bigserial	serial8	8 bytes	1 ~ 9223372036854775807	Large range of self-increasing integers
real	float4	4 bytes	6-bit precision	variable precision
double precision	float8	8 bytes	15-bit precision	variable precision
decimal	numeric	variable	unlimited	fixed precision

As can be seen from the table, the numerical types of MatrixDB are divided into 4 categories:

Integer: Different sizes represent different ranges, and they are all signed numbers
1. smallint
2. int
3. bigint
Fixed point number: can represent a fixed number of decimal points
1. decimal
Floating point number: can represent a number with unfixed decimal points
1. real
2. double precision
Self-increase type: used for self-increase ID column
1. serial
2. bigserial

1.2 String

Type	Alias	Size*	Scope	Description
character [ (n) ]	char [ (n) ]	1 byte + n	Longest n characters string	Fixed length string, unspecified characters are filled with spaces
character varying [ (n) ]	varchar [ (n) ]	1 byte + string size	string with a maximum of n characters	variable length string with a length limit, upper limit of 10485760 bytes
text		1 byte + string length	any length string	variable length string without length limit

As can be seen from the table, MatrixDB string types are divided into 2 categories:

Fixed length: char
Get longer
1. varchar: There is a length limit
2. text: No length limit

1.3 Date and time

Type	Alias	Size*	Scope	Description
date		4 bytes	4713 BC ~ 294,277 AD	Calendar date (year, month, day)
time [ (p) ] [ without time zone ]		8 bytes	00:00:00[.0000000] ~ 24:00:00[.0000000]	Time in one day
time [ (p) ] with time zone	timetz	12 bytes	00:00:00+1359 ~ 24:00:00-1359	Time within one day, with time zone
timestamp [ (p) ] [ without time zone ]		8 bytes	4713 BC ~ 294,277 AD	Date and time
timestamp [ (p) ] with time zone	timestamptz	8 bytes	4713 BC ~ 294,277 AD	Date and time, with time zone

As can be seen from the table, MatrixDB has three types of date and time:

Date: can only represent year-month-day, and cannot represent time
1. date
Time: can only represent time, cannot represent date
1. time [without time zone] (without time zone)
2. time with time zone (with time zone)
Date and time: can represent both date and time
1. timestamp [without time zone] (without time zone)
2. timestamp with time zone (with time zone)

Effect of time zone on time type

You can see a special information from the time type time zone real-time zone. If not specified, the default is without time zone, that is, there is no time zone. So what impact does the time zone have on the time type? Let’s create a test table to demonstrate the description.

CREATE TABLE test (
    c1 int,
    c2 timestamp,
    c3 timestamp with time zone
)DISTRIBUTED BY (c1);

In addition to the distribution key, the test table contains two timestamp columns, one without time zone by default and the other with time zone. Then insert the test data and query:

INSERT INTO test VALUES(1, now(), now());

SELECT * FROM test;
 c1 |            c2            |             c3
----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  1 | 2021-12-01 14:54:09.4783 | 2021-12-01 14:54:09.4783+08
(1 row)

SHOW timezone;
   TimeZone
------------------------
 Asia/Shanghai
(1 row)

From the query results, we can see that the time displayed by the two timestamp columns is the same. There is a +08 behind the column with time zone, indicating that the time zone to which the timestamp belongs is East Eighth District. Then, through the SHOW timezone command, I saw that the current time zone is 'Asia/Shanghai', that is, Beijing time.

Let's modify the time zone and view the results:

SET timezone TO 'Japan';

SELECT * FROM test;
 c1 |            c2            |             c3
----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  1 | 2021-12-01 14:54:09.4783 | 2021-12-01 15:54:09.4783+09
(1 row)

It can be seen that after changing the database time zone to Japan, the time displayed in the c3 column with time zone is different from that in c2, and the subsequent time zone information becomes +09, that is, Dongjiu District.

Therefore, the time stored in the timestamp column with time zone has time zone information and will display different times in different time zones; while the timetamp column without time zones will display the same time in any time zone. When the device is deployed in a region of different time zones, the timestamp must have time zone information. For timestamp type, there is no time zone occupies storage space, so it is recommended to add it.

Suggestions for using basic data types in timing scenarios

Timing scene data can be divided into the following categories:

Timestamp: The time of the indicator collection must include both date and time, so it is recommended to use the timestamp type and with a time zone
Equipment identification
1. If you are using string identification devices, it is recommended to use varchar and text. Because the length is immutable, char is suitable for storing device number information of fixed lengths, such as serial number.
2. If you create a device table separately for the device, you can use serial or bigserial as the self-increment ID to uniquely identify the device.
Acquisition indicator: Usually a numeric value can be determined based on the value range and whether there are decimals.
1. If the acquisition result is an integer, select the integer data type and determine whether to use smallint, int or bigint based on the result range.
2. If the acquisition result has a decimal number and the exact number of decimal points is fixed, the value range is also relatively fixed, such as temperature, it is suitable to use decimal.
3. If the collection result takes a decimal and the exact number of decimal points is not fixed and the value range is large, you need to select float or double precision according to the accuracy.

2. Extend data types

In the relational model, the basic data type, each column represents the value of an attribute, such as: name, height, weight. But face the following challenges:

The indicator set cannot be predicted, that is, the table schema may change frequently
The metric set is too large, exceeding the column limit of up to 1600 columns in the MatrixDB table

At this time, you need to use the extended data type. The so-called extended data type means that multiple attributes can be stored in a single column.

There are many ways to implement extended data types, such as customizing the serialized storage format and encoding method, and then storing it into string types. MatrixDB provides two extended data types:

json/jsonb
mxkv

2.1 json/jsonb

The json/jsonb type inherits from the PostgreSQL database. Unlike directly saving json strings with columns of string type, the json/jsonb type provides commonly used json operation functions for reading key values, as well as merging, deleting and other operations.

The usage methods of json and jsonb are basically the same, and they can also be directly converted to each other. The difference is:

Save json as text, save jsonb as binary
There is no need to do binary conversion when writing json, the writing speed is faster
jsonb has faster parsing speed because it has done binary transfer

The following are the commonly used json/jsonb type operation methods:

Use -> Extract key value

SELECT '{"a":1, "b":2, "c":3}'::json->'a';
 ?column?
----------
 1
(1 row)

If the json/jsonb level is nested, use -> multiple times:

SELECT '{"a":1, "b":2, "c":{"d":2}}'::json->'c'->'d';
 ?column?
----------
 2
(1 row)

Delete key

Use the - number to delete the key. When writing to a json object, it must be converted to a jsonb type:

SELECT '{"a":1, "b":2, "c":3}'::jsonb - 'a';
     ?column?
------------------
 {"b": 2, "c": 3}
(1 row)

Merge

The merge operation can merge two jsonb-type keys together, using the || operator:

SELECT '{"a":1, "b":2}'::jsonb || '{"c":3}'::jsonb;
         ?column?
------------------------------
 {"a": 1, "b": 2, "c": 3}
(1 row)

renew

|| Operator will update the value when the key value is the same:

SELECT '{"a":1, "b":2}'::jsonb || '{"b":3}'::jsonb;
     ?column?
------------------
 {"a": 1, "b": 3}
(1 row)

For more json operation methods, please refer to PostgreSQL [document] (https://www.postgresql.org/docs/12/functions-json.html).

2.2 mxkv

mxkv is a highly efficient and scalable storage type developed by MatrixDB. The usage method is similar to json. The advantage compared to json is: mxkv is binary storage and is compressed, with smaller storage space and higher query efficiency.

For detailed usage methods, please refer to [Document] (/doc/latest/datamodel/mxkv) and [blog] (/blog/20210721-MatrixDB-mxkv).

3. Spatial data types

MatrixDB 4.3 has developed an enhanced version of PostGIS components that support storage and computing of spatial data types. For specific usage methods, please refer to [Document] (/doc/latest/datamodel/spatial_type).

简体中文