A typical GroupBy
SQL query aggregates on just one set of columns.
For instance:
Grouping data on column “A” will require one query.
Grouping data on columns “A” and “B” will require a separate query.
Next, if the two outputs must be gathered in a single table, we use UNION
or UNION ALL
(as needed).
But this is not efficient as it scans the same table twice.
Instead, there are three ways to run multiple aggregations on the same table by scanning the table just once.
This makes our query much more efficient.
These are Grouping Sets, Rollup, and Cube, and their usage has been depicted in the image below:
#1) Grouping sets
The GROUPING SETS
clause allows us to define multiple groupings in a single query.
Each grouping set defines a combination of columns by which the data is grouped.
Given the above query, here are the group aggregations that will be created:
(A): Aggregated by
A
, counting all rows across all other columns.(A, B): Aggregated by
(A, B)
, counting all rows across all other columns.(C): Aggregated by
C
, counting all rows across all other columns.
A demonstration is shown below:
The above query:
Generates an aggregation on
City
column.Generates another aggregation on
Fruit
column.
#2) ROLLUP
ROLLUP
creates a result set that includes subtotals and a grand total in addition to the regular grouped results.
It does this by grouping the data at multiple levels of aggregation.
Given the above query, here are the group aggregations that will be created:
(A): Subtotal for each
A
, aggregated across allB
andC
.(A, B): Subtotal for each
A
andB
combination, aggregated across allC
.(A, B, C): Regular group by all three columns.
(): Grand total, aggregated across all
A
,B
, andC
.
A demonstration is shown below:
The above query:
Generates an aggregation on
Fruit
column.Generates another aggregation on (
Fruit, City
) column.Generates a grand total.
Unlike Grouping Sets, the order is important in ROLLUP.
More specifically, ROLLUP (A, B)
will not be the same as ROLLUP (B, A)
.
#3) CUBE
Finally, CUBE
creates a result set that includes all possible combinations of aggregations for the specified columns.
Given the above query, here are the group aggregations that will be created:
(A, B, C): Regular group by all three columns.
(A, B): Subtotal for each
A
andB
combination, aggregated across allC
.(A, C): Subtotal for each
A
andC
combination, aggregated across allB
.(B, C): Subtotal for each
B
andC
combination, aggregated across allA
.(A): Subtotal for each
A
, aggregated across allB
andC
.(B): Subtotal for each
B
, aggregated across allA
andC
.(C): Subtotal for each
C
, aggregated across allA
andB
.(): Grand total, aggregated across all
A
,B
, andC
.
A demonstration is shown below:
The above query:
Generates an aggregation on
Fruit
column.Generates an aggregation on
City
column.Generates another aggregation on (
Fruit, City
) column.Generates a grand total.
Done!
Try it out by downloading this Jupyter Notebook: Grouping Sets, ROLLUP, CUBE Notebook.
👉 Over to you: What are some other lesser-known SQL keywords?
Let me help you more…
Every week, I publish in-depth ML deep dives. The topics align with the practical skills that typical ML/DS roles demand.
Join below to unlock all full articles:
Here are some of the top articles:
[FREE] A Beginner-friendly Introduction to Kolmogorov Arnold Networks (KANs).
Implementing Parallelized CUDA Programs From Scratch Using CUDA Programming
Understanding LoRA-derived Techniques for Optimal LLM Fine-tuning
8 Fatal (Yet Non-obvious) Pitfalls and Cautionary Measures in Data Science.
5 Must-Know Ways to Test ML Models in Production (Implementation Included).
Don’t Stop at Pandas and Sklearn! Get Started with Spark DataFrames and Big Data ML using PySpark.
Join below to unlock all full articles:
SPONSOR US
Get your product in front of more than 76,000 data scientists and other tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., who have influence over significant tech decisions and big purchases.
To ensure your product reaches this influential audience, reserve your space here or reply to this email.
Thank you for sharing, Avi!
I'm curious, how can we determine which version of SQL supports these GROUP BY extensions?