What Feature Scaling and Standardization is NOT Used For?

Understanding scaling and standardization from the perspective of skewness.

Avi Chawla

Sep 21, 2024

Daily Dose of Data Science Free Book | Deep Dives

Feature scaling and standardization are two common ways to alter a feature’s range. For instance:

MinMaxScaler changes the range of a feature to [0,1]:

Standardization makes a feature’s mean zero and standard deviation one:

As you may already know, these operations are necessary because:

They prevent a specific feature from strongly influencing the model’s output.
They ensure that the model is more robust to wide variations in the data.

For instance, in the image below, the scale of “income” could massively impact the overall prediction.

Scaling (or standardizing) the data to a similar range can mitigate this and improve the model’s performance.

In fact, the following image precisely verifies this:

As depicted above, feature scaling is necessary for the better performance of many ML models.

So while the importance of feature scaling and standardization is pretty clear and well-known, I have seen many people misinterpreting them as techniques to eliminate skewness.

But contrary to this common belief, feature scaling and standardization NEVER change the underlying distribution.

Instead, they just alter the range of values.

Thus, after scaling (or standardization):

Normal distribution → stays Normal
Uniform distribution → stays Uniform
Skewed distribution → stays Skewed
and so on…

We can also verify this from the two illustrations below:

Here’s another image that verifies this:

It is clear that scaling and standardization have no effect on the underlying distribution.

Thus, always remember that if you intend to eliminate skewness, scaling/standardization will never help.

Try feature transformations instead.

There are many of them, but the most commonly used transformations are:

Log transform
Sqrt transform
Box-cox transform

Their effectiveness is evident from the image below:

As depicted above, applying these operations transforms the skewed data into a (somewhat) normally distributed variable.

Before I conclude, please note that while log transform is commonly used to eliminate data skewness, it is not always the ideal solution.

I recently wrote about it on X (read here or click the post below):

And if you are wondering why did we convert the above-skewed data to a normal distribution, and what its purpose was, then check out this issue:

11 Essential Ways to Determine Normality of Data Distributions

Avi Chawla

October 28, 2023

Read full story

👉 Over to you: What are some other ways to eliminate skewness?

For those who want to build a career in DS/ML on core expertise, not trends:

Every week, I publish no-fluff deep dives on topics that truly matter to your skills for ML/DS roles.

I want to read super-detailed articles

For instance:

Join below to unlock all full articles:

I want to read super-detailed articles

SPONSOR US

Get your product in front of ~90,000 data scientists and other tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., who have influence over significant tech decisions and big purchases.

To ensure your product reaches this influential audience, reserve your space here or reply to this email to ensure your product reaches this influential audience.

Daily Dose of Data Science

11 Essential Ways to Determine Normality of Data Distributions

Discussion about this post

Ready for more?