6 Comments

Given these performance gains in speed and memory, why would anyone opt to use pandas then over polars?

Expand full comment

Hey Omar

Just saw your comment. Didn't receive an email from substack so missed it.

White it's true that speedups are huge, many still prefer to use Pandas because of familiarity. At least that's what I have understood after reading tons of comments on my social media posts. Other than that, many say that they don't work with lots of data so Polars is of little use to them.

I think it's important to be aware of the limitations of Pandas and if you are okay with it, continue with Pandas. Otherwise, expand your familiarity to other frameworks like Dask, Polars etc.

Expand full comment

Awesome thanks a lot for your reply Avi, one last question if you don't mind. Did you use `pd.options.mode.dtype_back = 'pyarrow'` before running these benchmarks?

Expand full comment

While these benchmarks are with Pandas 1.5.3, I did try with Pandas 2.0 (both NumPy and Pyarrow) and didn't notice any differences. Pyarrow isn't working as I had expected it to work. I checked with another source and they reported the same too. So I am waiting for more info from Pandas as they update it before I write about it :)

Expand full comment

thanks. Though, is the performance benchmarked against pandas 2.0?

Expand full comment

This one is against 1.5.3 but I did try with 2.0 and there were no major differences between Pandas 2.0 and 1.5.3. Both of them were running with almost same run-time.

Expand full comment