This document gives examples of typical query statements in OLAP and OLTP scenarios. Since most scenarios are fusion in actual application, this article is not strictly classified.
Notes!
Understanding the query requirements of a scenario is one of the keys to building an accurate [data model] (/doc/latest/datamodel/guidebook).
OLAP is Online Analysis Processing, which is online analysis and processing scenarios. It is also often called a data warehouse scenario and is mostly used for decision support. It can perform complex data analytics for multi-angle observation by data scientists, business analysis and knowledge workers, supporting business intelligence (BI), data mining and other decision-making support applications.
Example:
=# SELECT *,
row_number() OVER (ORDER BY c1 DESC),
rank() OVER (ORDER BY c1 DESC),
dense_rank() OVER (ORDER BY c1 DESC)
FROM t1;
Key Points:
row_number()
, rank()
, dense_rank()
to calculate different types of rankings:row_number()
function assigns a unique successful ranking value to each row in the result set, which is arranged in descending order of the c1
column.rank()
function assigns a ranking value to each different c1
value in the result set, sorting it in descending order of the c1
column. If there is the same c1
value, the corresponding ranking will be skipped.dense_rank()
function assigns a ranking value to each different c1
value in the result set, sorting it in descending order of the c1
column. If there is the same c1
value, the corresponding ranking will be retained and will not be skipped.Example:
=# SELECT *,
first_value(c1) OVER (ORDER BY c1 DESC),
last_value(c1) OVER (ORDER BY c1 DESC)
FROM t1;
Example:
=# SELECT *,
SUM(c1) OVER (ORDER BY c1 DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS sum_c1 -- 添加框架声明,使得窗口涵盖从第一行到最后一行的所有行
FROM t1;
Use window frames to control the range of aggregate calculations, such as calculating the average of each month's income and the previous two months:
=# SELECT month,
revenue,
avg(revenue) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
FROM t;
Key Points:
Inquiry of population records in each city where income is between 40% and 70% of the city’s (inclusive).
=# SELECT id,name,salary,city
FROM (
SELECT *,
ntile(10) OVER (
PARTITION BY city
ORDER BY salary DESC
) AS bucket
FROM city_salary
) AS t
WHERE t.bucket >= 4 AND t.bucket <= 7;
Key Points:
ntile()
to bucket the records in the city_salary
table and filters out the records from the 4th to the 7th bucket from 10 buckets.Calculate the proportion of changes in monthly income relative to the previous month:
=# WITH detail(m, r, l) AS (
SELECT month,
revenue,
lag(revenue, 1) OVER (ORDER BY month) AS last_month_revenue
FROM sales
)
SELECT m, ((r-l)::float8/l * 100)::decimal(18,2) || '%'
FROM detail;
Key Points:
(r-l)::float8/l * 100)::decimal(18,2) || '%'
: This expression is used for calculation of the growth percentage. It subtracts sales revenue from the current month from the previous month and divides the result from the previous month's sales revenue. Finally, use type conversion to convert the result to percentage format and retain the accuracy of the two decimal places.Check the proportion of monthly revenue to the current quarterly revenue:
=# SELECT *,
revenue::real / quarter_revenue * 100 AS percent_of_quarter
FROM (
SELECT quarter,
month,
revenue,
sum(revenue) OVER (PARTITION BY quarter) AS quarter_revenue
FROM sales2
) AS t;
Change points list records that have changed compared to the previous one. This type of query is suitable for finding changes in relatively stable data sets:
=# SELECT time, read FROM (
SELECT time,
read,
read - lag(read) OVER (ORDER BY TIME) AS diff
FROM disk
WHERE time BETWEEN '2021-04-10 21:00:00'::timestamp AND '2021-04-10 21:01:00'::timestamp
AND tag_id = 1 ) ht
WHERE diff IS NULL OR diff != 0
ORDER BY time;
Key Points:
lag()
to calculate the difference between the read value of each time point and the read value of the previous time point.WHERE diff IS NULL OR diff != 0
means that only time-point records with a difference value of NULL
or non-zero are filtered out.Calculate the value of the disk array sorted by tag_id
, where the read speed is at 20%
position and the write speed is at 30%
position:
=# SELECT tag_id,
percentile_cont(0.2) WITHIN GROUP (ORDER BY read) AS read,
percentile_cont(0.3) WITHIN GROUP (ORDER BY write) AS write
FROM disk
GROUP BY tag_id
ORDER BY tag_id;
If the function parameter value is 0.5
, it is equivalent to calculating the median.
For example, using multi-table association aggregation to compare the indicators of a retail store during holidays:
=# WITH holiday_rst AS (
SELECT year,holiday,date,row_number() OVER(PARTITION BY year,holiday ORDER BY date)
AS the_day_of_the_holiday
FROM dim_holiday_info
)
SELECT t.year,t.holiday,t.date,t.the_day_of_the_holiday,b.date AS corresponding_date_of_previous_year
FROM holiday_rst t
LEFT JOIN holiday rst b
ON t.year = b.year + 1
AND t.holiday = b.holiday
AND t.the_day_of_the_holiday = b.the_day_of_the_holiday;
Key Points:
In the field of business intelligence (BI), it is necessary to see through a data set from different dimensions, and through data visualization, the knowledge contained in the data is presented to decision makers to realize their business value.
Data visualization is an important part of multi-dimensional data analysis. Through specific BI analysis software, the array of data cubes output by the database is depicted into graphs, images and metering, so that the inherent statistical features of big data can be identified and perceived by decision makers, and can also be interacted and predicted.
Aggregation calculations are performed from multiple dimensions and the results are displayed together:
=# SELECT city,
season,
category,
sum(value)
FROM bi_sales
GROUP BY GROUPING SETS (city, season, category)
ORDER by (city);
city | season | category | sum
----------+-----------------------------------------------------------------------------------------------------------------------
Beijing | | | 1098
Shanghai | | | 1089
Guangzhou| | | 1080
| Q3 | | 1989
| | Groceries | 1089
| Q1 | | 189
| | Electrics | 999
| | Books | 1179
| Q2 | | 1089
(9 rows)
It can be seen that the sum-aggregation results are shown separately:
OLTP is Online Transaction Processing, which is an online transaction processing scenario and is mostly used for daily transaction processing. It can quickly and accurately perform data processing for ATMs, online banking, cash registers, e-commerce and brick-and-mortar store purchases, as well as hotel and air ticket booking, and perform many database transactions (change, insert, delete or query operations) in real time.
Ad Hoc is the user who flexible selects query conditions according to their own needs. The system can quickly generate corresponding statistical reports based on user's choices. The biggest difference between ad-hoc query and ordinary application query is that ordinary application query is customized, while ad-hoc query is customized by users.
=# SELECT * FROM sales WHERE customer_id=100;