GROUPING SETS in SQL

A lesser-known way to run multiple aggregations.

Avi Chawla

Jul 28, 2024

Daily Dose of Data Science Free Book | Deep Dives

A few weeks back, we discussed GROUPING SETS, ROLLUP, and CUBE in SQL. Here’s the visual for a quick recap:

Among these three, the one I have seen people using most often is GROUPING SETS.

So today, let’s do a more comprehensive demo of this and understand how it works.

Background

Whenever running multiple aggregations on some table, most people write multiple SQL queries — one query per aggregation.

Finally, if the results are to be gathered in a single table, they aggregate all the above tables using UNION or UNION ALL (as needed).

But this is among the most inefficient ways to approach this.

Let me explain with an example.

Consider an organization that has the following Employees table in their database:

The task is to get the following information in the same table:

The total employees in each City → This will involve an aggregation on the City column.
The total full-time employees and interns in each City → This will involve an aggregation on the City and Status column.

So the final output must look like this:

Every city has three records:

One for total employees.
One for total full-time employees.
One for total interns.

The standard approach

The most common approach of doing this involves the following SQL query:

An additional sorting step using `ORDER BY` and alias for the `Count` column is not shown here intentionally.

We have one query for finding the total employees in each City.
We have another query for finding the total full-time employees and interns in each city by aggregating on both City and Status columns.

Finally, we take a UNION of both results to get the desired results:

Cool! This works as expected.

But the biggest bottleneck in this approach is that it involves scanning the same table twice.

Can we do better?

Of course we can!

The smart approach — Grouping Sets

Grouping sets is a great way to run multiple aggregations on the same table by scanning the table just once instead of multiple times.

Quite clearly, this makes our query much more efficient.

This is demonstrated below:

In this query:

We specify all aggregation columns we want to group on using the GROUPING SETS keyword.

From the above query, it is pretty clear that we are only scanning the whole table once.

Also, just like the UNION approach, it produces the desired results:

The SQL query with GROUPING SETS is much more efficient, elegant, and shorter.

Isn’t that cool?

Try it out by downloading this Jupyter Notebook: Grouping Sets Notebook.

👉 Over to you: What are some other cool ways of using GROUPING SETS?

Are you overwhelmed with the amount of information in ML/DS?

Every week, I publish no-fluff deep dives on topics that truly matter to your skills for ML/DS roles.

I want to read super-detailed articles

For instance:

Join below to unlock all full articles:

I want to read super-detailed articles

SPONSOR US

Get your product in front of 84,000 data scientists and other tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., who have influence over significant tech decisions and big purchases.

To ensure your product reaches this influential audience, reserve your space here or reply to this email to ensure your product reaches this influential audience.

Daily Dose of Data Science