The fastest inference for DeepSeek-R1 671B model
Optimized inference engines are as important as having good LLMs.
But GPUs weren’t built for AI.
SambaNova Systems built the world’s fastest AI inference using its specialized hardware stack (RDUs)—a 10x faster alternative to GPU.
In fact, their specialized SN40L chip can load models as big as trillions of parameters.
Below, DeepSeek-R1 671B is generating 150 tokens per second—most likely the fastest you will find anywhere.
Access it here: Fastest inference on DeepSeek-R1 671B.
SambaNova Cloud delivers:
10x faster inference than GPUs
Support for trillion-parameter models
Optimized performance for most open-source models.
Leverage the fastest LLM inference here →
Thanks to SambaNova for partnering today!
Shuffle Feature Importance
“Shuffle Feature Importance” is a great technique to measure feature importance.
In a gist, it observes how shuffling a feature influences the model performance. The visual below illustrates this technique in four simple steps:
Train the model and measure its performance →
P1
.Shuffle one feature randomly and measure performance again →
P2
(model is NOT trained again).Measure feature importance using performance drop = (
P1-P2
).Repeat for all features.
This also makes intuitive sense.
Simply put, if we randomly shuffle just one feature, then the performance drop will indicate how important that feature is.
If the performance drop is low → feature has a low influence on the model.
If the performance drop is high → feature has a high influence on the model.
That said, to eliminate any potential effects of randomness, it is recommended to shuffle the same feature multiple times and measure the average performance drop.
There is one caveat though.
If two features are highly correlated, and one of them is permuted/shuffled, the model will still have access to the original feature through its correlated counterpart:
One way to handle this is to cluster highly correlated features and only pick one feature from each cluster.
All that said, here are a few things that stand out about this technique:
It requires no repetitive model training. Just train the model once and measure the feature importance.
It is pretty simple to use and quite intuitive to interpret.
This technique can be used for all ML models that can be evaluated.
👉 Over to you: What other reliable feature importance techniques do you use frequently?
Thanks for reading and we’ll see you next week!
P.S. For those wanting to develop “Industry ML” expertise:
At the end of the day, all businesses care about impact. That’s it!
Can you reduce costs?
Drive revenue?
Can you scale ML models?
Predict trends before they happen?
We have discussed several other topics (with implementations) in the past that align with such topics.
Here are some of them:
Learn sophisticated graph architectures and how to train them on graph data: A Crash Course on Graph Neural Networks – Part 1.
So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here: Bi-encoders and Cross-encoders for Sentence Pair Similarity Scoring – Part 1.
Learn techniques to run large models on small devices: Quantization: Optimize ML Models to Run Them on Tiny Hardware.
Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust: Conformal Predictions: Build Confidence in Your ML Model’s Predictions.
Learn how to identify causal relationships and answer business questions: A Crash Course on Causality – Part 1
Learn how to scale ML model training: A Practical Guide to Scaling ML Model Training.
Learn techniques to reliably roll out new models in production: 5 Must-Know Ways to Test ML Models in Production (Implementation Included)
Learn how to build privacy-first ML systems: Federated Learning: A Critical Step Towards Privacy-Preserving Machine Learning.
Learn how to compress ML models and reduce costs: Model Compression: A Critical Step Towards Efficient Machine Learning.
All these resources will help you cultivate key skills that businesses and companies care about the most.