Pandas vs Polars — Run-time and Memory…

Avi Chawla

Jun 12, 2023

A comprehensive benchmarking.

Read →

6 Comments

Omar AlSuwaidi

Jun 12, 2023

Given these performance gains in speed and memory, why would anyone opt to use pandas then over polars?

Expand full comment

Reply (1)

Avi Chawla

Jun 23, 2023Edited

Hey Omar

Just saw your comment. Didn't receive an email from substack so missed it.

White it's true that speedups are huge, many still prefer to use Pandas because of familiarity. At least that's what I have understood after reading tons of comments on my social media posts. Other than that, many say that they don't work with lots of data so Polars is of little use to them.

I think it's important to be aware of the limitations of Pandas and if you are okay with it, continue with Pandas. Otherwise, expand your familiarity to other frameworks like Dask, Polars etc.

Expand full comment

Reply (1)

Omar AlSuwaidi

Jun 24, 2023

Awesome thanks a lot for your reply Avi, one last question if you don't mind. Did you use `pd.options.mode.dtype_back = 'pyarrow'` before running these benchmarks?

Expand full comment

Reply (1)

Avi Chawla

Jun 26, 2023

While these benchmarks are with Pandas 1.5.3, I did try with Pandas 2.0 (both NumPy and Pyarrow) and didn't notice any differences. Pyarrow isn't working as I had expected it to work. I checked with another source and they reported the same too. So I am waiting for more info from Pandas as they update it before I write about it :)

Expand full comment

Neiv Nova

Jun 12, 2023

thanks. Though, is the performance benchmarked against pandas 2.0?

Expand full comment

Reply (1)

Avi Chawla

Jun 12, 2023

This one is against 1.5.3 but I did try with 2.0 and there were no major differences between Pandas 2.0 and 1.5.3. Both of them were running with almost same run-time.

Expand full comment

Daily Dose of Data Science

Pandas vs Polars — Run-time and Memory…