Why Join() Is Faster Than Iteration?
A reminder to always prefer specific methods over a generalized approach.
There are two popular ways to concatenate multiple strings:
Iterating and appending them to a single string.
Using Python’s in-built
join()
method.
But as shown below, the 2nd approach is significantly faster than the 1st approach:
Can you answer why?
The answer is not vectorization!
Continue reading to learn more.
When concatenating using iteration, Python naively executes the instructions it comes across.
Thus, it does not know (beforehand):
number of strings it will concatenate
number of white spaces it will need
Simply put, iteration inhibits any scope for optimization.
As a result, during every iteration, Python asks for a memory allocation of:
the string at the current iteration
the white space added as a separator
This leads to repeated calls to memory.
To be precise, the number of calls in this case is two times the size of the list.
But this is not the case when we use join()
.
Because in that case, Python precisely knows (beforehand):
number of strings it will be concatenating
number of white spaces it will need
All these are applied for allocation in a single call and are available upfront before concatenation.
To summarize:
with iteration, the number of memory allocation calls is 2x the list's size.
with
join()
, the number of memory allocation calls is just one.
This explains the significant difference in their run-time we noticed earlier.
This post is also a reminder to ALWAYS prefer specific methods over a generalized approach.
These subtle sources of optimization can lead to profound improvements in run-time and memory utilization of your code.
👉 Over to you: What other ways do you commonly use to optimize native Python code?
Are you overwhelmed with the amount of information in ML/DS?
Every week, I publish no-fluff deep dives on topics that truly matter to your skills for ML/DS roles.
For instance:
Conformal Predictions: Build Confidence in Your ML Model’s Predictions
Quantization: Optimize ML Models to Run Them on Tiny Hardware
5 Must-Know Ways to Test ML Models in Production (Implementation Included)
8 Fatal (Yet Non-obvious) Pitfalls and Cautionary Measures in Data Science
Implementing Parallelized CUDA Programs From Scratch Using CUDA Programming
You Are Probably Building Inconsistent Classification Models Without Even Realizing
And many many more.
Join below to unlock all full articles:
SPONSOR US
Get your product in front of 85,000 data scientists and other tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., who have influence over significant tech decisions and big purchases.
To ensure your product reaches this influential audience, reserve your space here or reply to this email to ensure your product reaches this influential audience.