Just saw your comment. Didn't receive an email from substack so missed it.
White it's true that speedups are huge, many still prefer to use Pandas because of familiarity. At least that's what I have understood after reading tons of comments on my social media posts. Other than that, many say that they don't work with lots of data so Polars is of little use to them.
I think it's important to be aware of the limitations of Pandas and if you are okay with it, continue with Pandas. Otherwise, expand your familiarity to other frameworks like Dask, Polars etc.
Awesome thanks a lot for your reply Avi, one last question if you don't mind. Did you use `pd.options.mode.dtype_back = 'pyarrow'` before running these benchmarks?
While these benchmarks are with Pandas 1.5.3, I did try with Pandas 2.0 (both NumPy and Pyarrow) and didn't notice any differences. Pyarrow isn't working as I had expected it to work. I checked with another source and they reported the same too. So I am waiting for more info from Pandas as they update it before I write about it :)
This one is against 1.5.3 but I did try with 2.0 and there were no major differences between Pandas 2.0 and 1.5.3. Both of them were running with almost same run-time.
Given these performance gains in speed and memory, why would anyone opt to use pandas then over polars?
Hey Omar
Just saw your comment. Didn't receive an email from substack so missed it.
White it's true that speedups are huge, many still prefer to use Pandas because of familiarity. At least that's what I have understood after reading tons of comments on my social media posts. Other than that, many say that they don't work with lots of data so Polars is of little use to them.
I think it's important to be aware of the limitations of Pandas and if you are okay with it, continue with Pandas. Otherwise, expand your familiarity to other frameworks like Dask, Polars etc.
Awesome thanks a lot for your reply Avi, one last question if you don't mind. Did you use `pd.options.mode.dtype_back = 'pyarrow'` before running these benchmarks?
While these benchmarks are with Pandas 1.5.3, I did try with Pandas 2.0 (both NumPy and Pyarrow) and didn't notice any differences. Pyarrow isn't working as I had expected it to work. I checked with another source and they reported the same too. So I am waiting for more info from Pandas as they update it before I write about it :)
thanks. Though, is the performance benchmarked against pandas 2.0?
This one is against 1.5.3 but I did try with 2.0 and there were no major differences between Pandas 2.0 and 1.5.3. Both of them were running with almost same run-time.