4 Ways to Test ML Models in Production

...explained visually

Jun 21, 2024

Advertise to 80k readers | Deep Dives

We have discussed before in this newsletter that…

…despite rigorously testing an ML model locally (on validation and test sets), it could be a terrible idea to instantly replace the previous model with the new model.

A more reliable strategy is to test the model in production (yes, on real-world incoming data).

While this might sound risky, ML teams do it all the time, and it isn’t that complicated.

The following visual depicts 4 common strategies to do so:

We covered their implementation here: 5 Must-Know Ways to Test ML Models in Production (Implementation Included).

The current model is called the legacy model.
The new model is called the candidate model.

#1) A/B testing

Distribute the incoming requests non-uniformly between the legacy model and the candidate model.
Intentionally limit the exposure of the candidate model to avoid any potential risks. Thus, the number of requests sent to the candidate model must be low.

#2) Canary testing

In A/B testing, since traffic is randomly redirected to either model irrespective of the user, it can potentially affect all users.
In canary testing, the candidate model is released to a small subset of users in production and gradually rolled out to more users.

#3) Interleaved testing

This involves mixing the predictions of multiple models in the response.
Consider Amazon’s recommendation engine. In interleaved deployments, some product recommendations displayed on their homepage can come from the legacy model, while some can be produced by the candidate model.

#4) Shadow testing

All of the above techniques affect some (or all) users.
Shadow testing (or dark launches) lets us test a new model in a production environment without affecting the user experience.
The candidate model is deployed alongside the existing legacy model and serves requests like the legacy model. However, the output is not sent back to the user. Instead, the output is logged for later use to benchmark its performance against the legacy model.
We explicitly deploy the candidate model instead of testing offline because the production environment is difficult to replicate offline.
Shadow testing offers risk-free testing of the candidate model in a production environment.

That’s it!

In the full article, we covered one more technique (Multi-armed bandits deployments) and the implementation of all five techniques: 5 Must-Know Ways to Test ML Models in Production (Implementation Included).

👉 Over to you: What are some ways to test models in production?

Are you overwhelmed with the amount of information in ML/DS?

Every week, I publish no-fluff deep dives on topics that truly matter to your skills for ML/DS roles.

I want to read super-detailed articles

For instance:

Join below to unlock all full articles:

I want to read super-detailed articles

SPONSOR US

Get your product in front of 80,000 data scientists and other tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., who have influence over significant tech decisions and big purchases.

To ensure your product reaches this influential audience, reserve your space here or reply to this email to ensure your product reaches this influential audience.

Daily Dose of Data Science

Discussion about this post

Ready for more?