Daily Dose of Data Science

Daily Dose of Data Science

Share this post

Daily Dose of Data Science
Daily Dose of Data Science
A Misconception About Log Transform
Copy link
Facebook
Email
Notes
More
User's avatar
Discover more from Daily Dose of Data Science
A free newsletter for continuous learning about data science and ML, lesser-known techniques, and how to apply them in 2 minutes. We keep things no-fluff. Join 100,000+ data scientists from top companies like Google, NVIDIA, Microsoft, Uber, etc.
Already have an account? Sign in

A Misconception About Log Transform

...and here's what it does.

Avi Chawla's avatar
Avi Chawla
May 28, 2024
38

Share this post

Daily Dose of Data Science
Daily Dose of Data Science
A Misconception About Log Transform
Copy link
Facebook
Email
Notes
More
3
Share

Launching big data analytics projects can take days. DoubleCloud simplifies this and reduces the project launch time to just a few minutes.

Setting up a data analytics project is as simple as selecting technologies from a dropdown, and DoubleCloud takes care of the integration procedures.

I read on their website that DoubleCloud’s customers like:

  • LSports reduced query speed by 180x compared to MySQL.

  • Honeybadger experienced a 30x boost compared to Elasticsearch.

  • Spectrio cut costs by 30-40% compared to Snowflake.

Join below:

DoubleCloud

I recently wrote a newsletter issue about DoubleCloud in case you missed it: How To Simplify ANY Data Analytics Project with DoubleCloud?

Thanks to DoubleCloud for partnering today!


Misconception About Log Transform

Log transform is commonly used to eliminate skewness in data.

Yet, it is not always the ideal solution for eliminating skewness.

It is important to note that log transform:

  • DOES NOT eliminate left-skewness.

  • Only works for right-skewness, that too when the values are small and positive.

To understand better, consider the left-skewed distribution and its log transform below:

It is clear that log transform did not eliminate skewness.

However, now consider a right-skewed distribution:

This time, it eliminates the skewness.

This happens because the log function grows faster for lower values. Thus, it stretches out the lower values more than the higher values.

More specifically, in the case of left-skewness, the tail exists to the left, which gets stretched more than the majority of the probability mass that exists to the right. Thus, skewness isn't affected much.

Left-skewness with a log transform

However, in the case of right-skewness, the majority of probability mass exists to the left, which gets stretched out more.

Right-skewness with a log transform

That said, it is important to note that even if we have a right-skewed distribution, log transform will not be effective if the values are large.

This happens because the stretch effect of the log function diminishes at large values:

I have often found the box-cox transform to be quite effective at eliminating both right-skewness and left-skewness, as depicted below:

To read next, here are 8 Fatal (Yet Non-obvious) Pitfalls and Cautionary Measures in Data Science.

👉 Over to you: What are some other ways to eliminate skewness?

Thanks for reading Daily Dose of Data Science! Subscribe for free to learn something new and insightful about Python and Data Science every day. Also, get a Free Data Science PDF (550+ pages) with 320+ tips.


Are you overwhelmed with the amount of information in ML/DS?

Every week, I publish no-fluff deep dives on topics that truly matter to your skills for ML/DS roles.

I want to read super-detailed articles

For instance:

  • A Beginner-friendly Introduction to Kolmogorov Arnold Networks (KANs).

  • 5 Must-Know Ways to Test ML Models in Production (Implementation Included).

  • A Beginner-Friendly Guide to Multi-GPU Model Training.

  • Understanding LoRA-derived Techniques for Optimal LLM Fine-tuning

  • 8 Fatal (Yet Non-obvious) Pitfalls and Cautionary Measures in Data Science

  • Implementing Parallelized CUDA Programs From Scratch Using CUDA Programming

  • You Are Probably Building Inconsistent Classification Models Without Even Realizing.

  • How To (Immensely) Optimize Your Machine Learning Development and Operations with MLflow.

  • And many many more.

Join below to unlock all full articles:

I want to read super-detailed articles


SPONSOR US

Get your product in front of more than 76,000 data scientists and other tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., who have influence over significant tech decisions and big purchases.

To ensure your product reaches this influential audience, reserve your space here or reply to this email to ensure your product reaches this influential audience.


Subscribe to Daily Dose of Data Science

A free newsletter for continuous learning about data science and ML, lesser-known techniques, and how to apply them in 2 minutes. We keep things no-fluff. Join 100,000+ data scientists from top companies like Google, NVIDIA, Microsoft, Uber, etc.
Nika's avatar
AJ's avatar
Nils W's avatar
Toxicle's avatar
Ousmaal's avatar
38 Likes∙
3 Restacks
38

Share this post

Daily Dose of Data Science
Daily Dose of Data Science
A Misconception About Log Transform
Copy link
Facebook
Email
Notes
More
3
Share

Discussion about this post

User's avatar
FREE Daily Dose of Data Science PDF
Collection of posts on core DS/ML topics.
Apr 20, 2023 â€¢ 
Avi Chawla
566

Share this post

Daily Dose of Data Science
Daily Dose of Data Science
FREE Daily Dose of Data Science PDF
Copy link
Facebook
Email
Notes
More
22
15 DS/ML Cheat Sheets
Single frame summaries of must-know DS/ML concepts and techniques.
Sep 22, 2024 â€¢ 
Avi Chawla
121

Share this post

Daily Dose of Data Science
Daily Dose of Data Science
15 DS/ML Cheat Sheets
Copy link
Facebook
Email
Notes
More
You Will NEVER Use Pandas’ Describe Method After Using These Two Libraries
Generate a comprehensive data summary in seconds.
Feb 6, 2024 â€¢ 
Avi Chawla
228

Share this post

Daily Dose of Data Science
Daily Dose of Data Science
You Will NEVER Use Pandas’ Describe Method After Using These Two Libraries
Copy link
Facebook
Email
Notes
More
14

Ready for more?

© 2025 Avi Chawla
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More

Create your profile

User's avatar

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.