2 Comments
Jun 19Liked by Avi Chawla

And tracemalloc is an excellent tool to prove that inplace=True does NOT even spare on memory consumption although this should be the very cause of its existence. tracemalloc shows the increase in memory usage and the maximum transient usage as well. I created a test Series:

import pandas as pd

data = [ 10, 8, 10, 20, 10, 8, 9, 11, 8, 6, 11, 6]

idx = ['b', 'a', 'b', 'c', 'b', 'a', 'd', 'e', 'a', 'f', 'e', 'f']

sr = pd.Series(data, index=idx)

Then I sorted it with inplace set to False and True. First measurement:

tm.start()

sr_1 = sr.sort_values(inplace=False)

del sr # just to be fair, though it played no role

print(tm.get_traced_memory()) # (6352, 10093)

tm.stop()

Second measurement:

tm.start()

sr.sort_values(inplace=True)

print(tm.get_traced_memory()) # (5648, 10093)

tm.stop()

I find tracemalloc a very good instrument because it shows the real increase of Python's memory consumption. When we run memory_usage(deep=True) on the above Series it shows only 792 bytes. It is only the tip of the iceberg above sea level.

Expand full comment

Thank you for the writeup Avi Chawla. I didn't know inplace operations are this bad in terms of performance and usability. I guess this feature should never have been introduced to pandas, there is no actual functional benefit to inplace operations and developers might use it without knowing the performance implications it has

Expand full comment