Two of the biggest problems with Pandas is that:
It always adheres to a single-core computation on a CPU.
It creates bulky DataFrames.
While many libraries (Polars, for instance) do address these limitations, they are still limited to CPU-driven computations.
NVIDIA’s RAPIDS cuDF library allows Pandas users to supercharge their Pandas workflow with GPUs.
How to use it?
Within a GPU runtime, do the following :
Load the extension:
%load_ext cudf.pandas
Import Pandas:
import pandas as pd
Done! Use Pandas’ methods as you usually would.
Just loading the extension provides immense speedups. This is evident from the gif below.
As per NVIDIA’s official release, this can be as fast as 150x.
In my personal experimentation, however, I mostly observed it to range between 50-70x, which is still pretty good.
The good thing is that the extension accelerates most Pandas’ methods.
Yet, if needed, it can automatically fall back to the CPU.
How does it work?
Whenever cudf.pandas
is enabled, the import pandas as pd
statement does not import the original Pandas library which we use all the time.
Instead, it imports another library that contains GPU-accelerated implementations of all Pandas methods.
This is evident from the image below:
This alternative implementation preserves the entire syntax of Pandas. So if you know Pandas, you already know how to use cuDF’s Pandas.
Isn’t that cool?
You can find the code here: Google Colab.
To learn CUDA programming from scratch, check this: Implementing (Massively) Parallelized CUDA Programs From Scratch Using CUDA Programming.
👉 Over to you: What are some other ways to accelerate Pandas operations in general?
1 Referral: Unlock 450+ practice questions on NumPy, Pandas, and SQL.
2 Referrals: Get access to advanced Python OOP deep dive.
3 Referrals: Get access to the PySpark deep dive for big-data mastery.
Get your unique referral link:
Are you preparing for ML/DS interviews or want to upskill at your current job?
Every week, I publish in-depth ML deep dives. The topics align with the practical skills that typical ML/DS roles demand.
Join below to unlock all full articles:
Here are some of the top articles:
[FREE] A Beginner-friendly and Comprehensive Deep Dive on Vector Databases.
Implementing Parallelized CUDA Programs From Scratch Using CUDA Programming
Understanding LoRA-derived Techniques for Optimal LLM Fine-tuning
8 Fatal (Yet Non-obvious) Pitfalls and Cautionary Measures in Data Science.
5 Must-Know Ways to Test ML Models in Production (Implementation Included).
Don’t Stop at Pandas and Sklearn! Get Started with Spark DataFrames and Big Data ML using PySpark.
Join below to unlock all full articles:
👉 If you love reading this newsletter, share it with friends!
👉 Tell the world what makes this newsletter special for you by leaving a review here :)
Hi! Thank you for a detail demonstration. I'm running the command on Jupyter notebook and faced some issues, could you help me address the problem?
Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
Collecting cudf-cu11
Using cached cudf_cu11-24.4.1.tar.gz (2.7 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... error
error: subprocess-exited-with-error
× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [55 lines of output]
File "/private/var/folders/3_/pnpcw_ds4jx9dstrjy443gbr0000gn/T/pip-build-env-i8kdudpm/overlay/lib/python3.11/site-packages/nvidia_stub/wheel.py", line 147, in download_wheel
return download_manual(wheel_directory, distribution, version)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/3_/pnpcw_ds4jx9dstrjy443gbr0000gn/T/pip-build-env-i8kdudpm/overlay/lib/python3.11/site-packages/nvidia_stub/wheel.py", line 114, in download_manual
raise RuntimeError(f"Didn't find wheel for {distribution} {version}")
Traceback (most recent call last):
File "/private/var/folders/3_/pnpcw_ds4jx9dstrjy443gbr0000gn/T/pip-build-env-i8kdudpm/overlay/lib/python3.11/site-packages/nvidia_stub/wheel.py", line 147, in download_wheel
return download_manual(wheel_directory, distribution, version)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/3_/pnpcw_ds4jx9dstrjy443gbr0000gn/T/pip-build-env-i8kdudpm/overlay/lib/python3.11/site-packages/nvidia_stub/wheel.py", line 114, in download_manual
raise RuntimeError(f"Didn't find wheel for {distribution} {version}")
RuntimeError: Didn't find wheel for cudf-cu11 24.4.1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
main()
File "/opt/anaconda3/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 152, in prepare_metadata_for_build_wheel
whl_basename = backend.build_wheel(metadata_directory, config_settings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/3_/pnpcw_ds4jx9dstrjy443gbr0000gn/T/pip-build-env-i8kdudpm/overlay/lib/python3.11/site-packages/nvidia_stub/buildapi.py", line 29, in build_wheel
return download_wheel(pathlib.Path(wheel_directory), config_settings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/3_/pnpcw_ds4jx9dstrjy443gbr0000gn/T/pip-build-env-i8kdudpm/overlay/lib/python3.11/site-packages/nvidia_stub/wheel.py", line 149, in download_wheel
report_install_failure(distribution, version, exception_context)
File "/private/var/folders/3_/pnpcw_ds4jx9dstrjy443gbr0000gn/T/pip-build-env-i8kdudpm/overlay/lib/python3.11/site-packages/nvidia_stub/error.py", line 63, in report_install_failure
raise InstallFailedError(
nvidia_stub.error.InstallFailedError:
*******************************************************************************
The installation of cudf-cu11 for version 24.4.1 failed.
This is a special placeholder package which downloads a real wheel package
from https://pypi.nvidia.com. If https://pypi.nvidia.com is not reachable, we
cannot download the real wheel file to install.
You might try installing this package via
```
$ pip install --extra-index-url https://pypi.nvidia.com cudf-cu11
```
Here is some debug information about your platform to include in any bug
report:
Python Version: CPython 3.11.7
Operating System: Darwin 23.0.0
CPU Architecture: arm64
nvidia-smi command not found. Ensure NVIDIA drivers are installed.
*******************************************************************************
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.