GROUPING SETS — A HIGHLY Underrated Technique to Run Multiple Aggregations While Scanning the Table Only Once

A lesser-known and much efficient way to run multiple aggregations.

Dec 09, 2023

Whenever running multiple aggregations on some table, most people write multiple SQL queries — one query per aggregation.

Finally, if the results are to be gathered in a single table, they aggregate all the above tables using UNION or UNION ALL (as needed).

But this is among the most inefficient ways to approach this.

Let me explain with an example.

Consider an organization that has the following Employees table in their database:

The task is to get the following information in the same table:

The total employees in each City → This will involve an aggregation on the City column.
The total full-time employees and interns in each City → This will involve an aggregation on the City and Status column.

So the final output must look like this:

Every city has three records:

One for total employees.
One for total full-time employees.
One for total interns.

The standard approach

The most common approach of doing this involves the following SQL query:

An additional sorting step using `ORDER BY` and alias for the `Count` column is not shown here intentionally.

We have one query for finding the total employees in each City.
We have another query for finding the total full-time employees and interns in each city by aggregating on both City and Status columns.

Finally, we take a UNION of both results to get the desired results:

Cool! This works as expected.

But the biggest bottleneck in this approach is that it involves scanning the same table twice.

Can we do better?

Of course we can!

The smart approach — Grouping Sets

Grouping sets is a great way to run multiple aggregations on the same table by scanning the table just once instead of multiple times.

Quite clearly, this makes our query much more efficient.

This is demonstrated below:

In this query:

We specify all aggregation columns we want to group on using the GROUPING SETS keyword.

From the above query, it is pretty clear that we are only scanning the whole table once.

Also, just like the UNION approach, it produces the desired results:

The SQL query with GROUPING SETS is much more efficient, elegant, and shorter.

Isn’t that cool?

Try it out by downloading this Jupyter Notebook: Grouping Sets Notebook.

👉 Over to you: What are some other cool ways of using GROUPING SETS?

👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights.

Thanks so much for appreciating the effort :)

The button is located towards the bottom of this email.

Thanks for reading!

Latest full articles

If you’re not a full subscriber, here’s what you missed:

To receive all full articles and support the Daily Dose of Data Science, consider subscribing:

I want to read full articles.

👉 Tell the world what makes this newsletter special for you by leaving a review here :)

Review Daily Dose of Data Science

👉 If you love reading this newsletter, feel free to share it with friends!

Share Daily Dose of Data Science

Omar AlSuwaidi

Dec 10, 2023

Wow... I never knew about this and didn't expect to get this valuable information from here! Always surprising us with great content Avi! I for sure will use this sometime in the near future in my SQL queries; bookmarked!

Expand full comment

Srinivas

Nice trick

1 more comment...

Daily Dose of Data Science

Discussion about this post

Ready for more?