Typical Queries in OLAP and OLTP Scenarios

This document provides examples of typical queries in OLAP and OLTP scenarios. Since most real-world query patterns are hybrid in nature, the classification here is not strictly exclusive.

Note!
Understanding the query requirements of a scenario is one of the key steps in building an accurate data model.


1 OLAP

OLAP stands for Online Analytical Processing. It is commonly referred to as a data warehousing scenario and is primarily used for decision support. OLAP enables complex data analysis, allowing data scientists, business analysts, and knowledge workers to examine data from multiple perspectives. It supports business intelligence (BI), data mining, and other decision-support applications.


1.1 Rank Calculation

Example:

=# SELECT *, 
          row_number() OVER (ORDER BY c1 DESC),
          rank() OVER (ORDER BY c1 DESC),
          dense_rank() OVER (ORDER BY c1 DESC)
   FROM t1;

Key Points:

  • Uses window functions row_number(), rank(), and dense_rank() to compute different types of rankings:
    • The row_number() function assigns a unique sequential rank to each row based on the descending order of column c1.
    • The rank() function assigns a rank to each distinct value in c1, ordered by c1 in descending order. Ties receive the same rank, and subsequent ranks are skipped accordingly.
    • The dense_rank() function assigns a rank to each distinct value in c1, ordered by c1 in descending order. Ties receive the same rank, but no ranks are skipped afterward.


1.2 First and Last Values

Example:

=# SELECT *,
          first_value(c1) OVER (ORDER BY c1 DESC),
          last_value(c1) OVER (ORDER BY c1 DESC)
   FROM t1;


1.3 Window Framing / Sliding Windows

Example:

=# SELECT *,
          SUM(c1) OVER (ORDER BY c1 DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS sum_c1 -- Frame declaration includes all rows from first to last
   FROM t1;

Window framing can also control the scope of aggregate calculations. For example, computing monthly revenue and its average over the current month and two months before and after:

=# SELECT month,
          revenue, 
          avg(revenue) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
   FROM t;

Key Points:

  • Use window frame clauses to define the boundaries of the window.


1.4 Distribution Statistics

Retrieve records of individuals whose income falls within the 40%–70% range (inclusive) in each city:

=# SELECT id,name,salary,city
   FROM (
    SELECT *,
           ntile(10) OVER (
            PARTITION BY city
            ORDER BY salary DESC
           ) AS bucket
    FROM city_salary
   ) AS t
   WHERE t.bucket >= 4 AND t.bucket <= 7;

Key Points:

  • This query uses the window function ntile() to divide records in the city_salary table into 10 equal-sized buckets. It then filters to return only those in buckets 4 through 7.


1.5 Offset Calculations

Calculate the percentage change in monthly revenue compared to the previous month:

=# WITH detail(m, r, l) AS (
    SELECT month, 
           revenue,
           lag(revenue, 1)  OVER (ORDER BY month) AS last_month_revenue
    FROM sales
   )
   SELECT m, ((r-l)::float8/l * 100)::decimal(18,2) || '%'
   FROM detail;

Key Points:

  • Uses a Common Table Expression (CTE) to compute monthly sales revenue and the percentage change from the prior month.
  • (r-l)::float8/l * 100)::decimal(18,2) || '%': Computes growth rate by subtracting last month's revenue from the current month's, dividing by last month's revenue. Type casting converts the result to a percentage with two decimal places.


1.6 Aggregation Over Windows

Calculate each month’s revenue as a percentage of its quarterly total:

=# SELECT *,
          revenue::real / quarter_revenue * 100 AS percent_of_quarter
   FROM (
   SELECT quarter,
          month,
          revenue,
          sum(revenue) OVER (PARTITION BY quarter) AS quarter_revenue
   FROM sales2
   ) AS t;


1.7 Change Detection

Identify records where values differ from the previous record. This type of query is useful for detecting changes in otherwise stable datasets:

=# SELECT time, read FROM (
        SELECT time,
            read,
            read - lag(read) OVER (ORDER BY TIME) AS diff
        FROM disk
        WHERE time BETWEEN '2021-04-10 21:00:00'::timestamp AND '2021-04-10 21:01:00'::timestamp
        AND tag_id = 1 ) ht
    WHERE diff IS NULL OR diff != 0
    ORDER BY time;

Key Points:

  • Uses the window function lag() to compute the difference between the current and previous read values.
  • The filter condition WHERE diff IS NULL OR diff != 0 retains only records where the difference is NULL or non-zero.


1.8 Continuous Percentiles

Compute the 20th percentile of read speed and 30th percentile of write speed, ordered by tag_id:

=# SELECT tag_id,
    percentile_cont(0.2) WITHIN GROUP (ORDER BY read) AS read,
    percentile_cont(0.3) WITHIN GROUP (ORDER BY write) AS write
    FROM disk
    GROUP BY tag_id
    ORDER BY tag_id;

If the function parameter is 0.5, it computes the median.


1.9 Metric Comparison Analysis

For example, use multi-table joins and aggregation to compare retail store metrics across holidays:

=# WITH holiday_rst AS (
    SELECT year,holiday,date,row_number() OVER(PARTITION BY year,holiday ORDER BY date)
    AS the_day_of_the_holiday
    FROM dim_holiday_info
    )
   SELECT t.year,t.holiday,t.date,t.the_day_of_the_holiday,b.date AS corresponding_date_of_previous_year
   FROM holiday_rst t
   LEFT JOIN holiday rst b
   ON t.year = b.year + 1
   AND t.holiday = b.holiday
   AND t.the_day_of_the_holiday = b.the_day_of_the_holiday;

Key Points:

  • Uses a CTE to assign a sequence number (the_day_of_the_holiday) to each day within a holiday period per year.
  • The result includes current-year holiday dates and their corresponding dates from the prior year.


1.10 Multidimensional Data Analysis

In the field of Business Intelligence (BI), multidimensional analysis allows users to examine datasets across various dimensions. Through data visualization tools, insights hidden in large datasets are presented to decision-makers in the form of charts, graphs, and dashboards, enabling interactive exploration and forecasting.

1.10.1 Multi-Dimensional Aggregation

Perform aggregations across multiple dimensions and present results together:

=# SELECT city,
          season, 
          category, 
          sum(value) 
   FROM bi_sales
   GROUP BY GROUPING SETS (city, season, category)
   ORDER by (city);
   city   | season | category  | sum
----------+--------+-----------+------
 Beijing  |        |           | 1098
 Shanghai |        |           | 1089
 Guangzhou|        |           | 1080
          | Q3     |           | 1989
          |        | Groceries | 1089
          | Q1     |           |  189
          |        | Electrics |  999
          |        | Books     | 1179
          | Q2     |           | 1089
(9 rows)

The aggregated results show:

  • City-level totals: sum of value for each city.
  • Season-level totals: sum of value for each season.
  • Category-level totals: sum of value for each product category.

2 OLTP

OLTP stands for Online Transaction Processing. It refers to transactional processing scenarios typically used for daily operations. OLTP systems handle fast, accurate processing of transactions such as ATM withdrawals, online banking, point-of-sale purchases, e-commerce orders, and hotel or flight reservations. They support real-time execution of numerous database operations including inserts, updates, deletes, and queries.


2.1 Ad Hoc Queries

Ad hoc queries allow users to dynamically specify filtering conditions based on their immediate needs. The system generates reports or results on demand. Unlike predefined application queries, ad hoc queries are user-defined and flexible.

=# SELECT * FROM sales WHERE customer_id=100;