PyTorch Models Are Not Entirely Deployment-Friendly
Eliminating the dependence of PyTorch models on Python.
PyTorch has always been my go-to library for building any deep learning model due to its flexibility, intuitive Pythonic API design, and ease of use.
However, deploying PyTorch-backed models in production systems encounters severe limitations specific to scale and performance, which is often overlooked at times.
To begin, one significant constraint of PyTorch is its predominant reliance on Python.
While Python offers simplicity, versatility, and readability, it is well known for being relatively slower compared to other languages like C++ or Java.
This poses challenges in scenarios where low-latency and high-throughput requirements are crucial, such as real-time applications and services.
In fact, it’s possible that the server we intend to deploy our model on might be leveraging any other language except Python, like C++, Java, and more.
Thus, the models we build MUST BE portable to various environments that are designed to handle concurrent requests at scale.
But, as discussed above, the Python-centric nature of PyTorch models can limit its integration with such systems or platforms.
A neat and simple way to do this is to develop PyTorch models in script mode instead, which is specifically designed for production use cases.
In the context of PyTorch, script mode refers to a set of tools and functionalities that allow developers to run their models in a more production-friendly manner.
If you are curious to learn more, I recently published a full deep dive on this topic: PyTorch Models Are Not Deployment-Friendly! Supercharge Them With TorchScript.
It will teach you how you can make your PyTorch models more production-friendly and language-agnostic.
👉 Interested folks can read it here: PyTorch Models Are Not Deployment-Friendly! Supercharge Them With TorchScript.
Deployment skills are immensely critical in data science and machine learning careers.
While PyTorch’s biggest USP is its simplicity, and Pythonic design.
However, this USP also introduces a major caveat when we deploy PyTorch-driven models.
Learning about building PyTorch models in script mode will go a long way in your career.
👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights.
The button is located towards the bottom of this email.
Thanks for reading!
You can deploy PyTorch models in production but the way to do that efficiently is to wrap them into an inference framework like BentoML.
This allows you to overcome the issues you’ve mentioned and doesn’t force you to export your model into TorchScript, which is by the way a format that doesn’t support some layers or sophisticated inference logic.
I really encourage you to look into BentoML, it provides features like:
- decoupling the web server from the ML inference pipeline
- horizontal and vertical scaling: each model in you inference graph is backed by a runner, that’s independently deployed on a pod on K8S
- async calls
- micro batching - for the record, this feature doesn’t exist in FastAPI
- gpu support