The first end-to-end platform for Reinforcement Fine-tuning!
Predibase has introduced an innovative fine-tuning method that uses reinforcement learning to achieve strong model performance with significantly less labeled data.
To give you some perspective, typical fine-tuning may need 1000s of data rows, but RFT only requires 10s of data rows.
You can learn RFT (with implementation and best practices) in their webinar on 27th March 2025.
It’s a free event.
Register for the hands-on webinar to upskill with LLM fine-tuning here →
Thanks to Predibase for partnering today!
Fine-tune DeepMind's Gemma 3 (100% locally)
Talking of LLM fine-tuning, let us give you some background on how LLMs are fine-tuned in practice.
Here’s what we'll be doing today—we'll fine-tune our private and locally running Gemma 3.
To do this, we'll use Unsloth for efficient fine-tuning. The code is linked later in the issue.
Let’s begin!
1) Load the model
We start by loading the Gemma 3 model and its tokenizer using Unsloth:
2) Define LoRA config
We must use efficient techniques like LoRA to avoid fine-tuning the entire model weights.
In this code, we use Unsloth's PEFT by specifying:
The model
LoRA low-rank (r)
Layers for fine-tuning
and a few more parameters.
3) Prepare dataset
Next, we use a conversation-style dataset to fine-tune Gemma 3.
The standardize_data_formats
method converts the dataset to the correct format for finetuning purposes!
4) Define Trainer
Here, we create a Trainer object by specifying the training config like learning rate, model, tokenizer, and more.
5) Train
With that done, we initiate training. The loss is currently fluctuating, which is expected, and it should start decreasing as it is exposed to more training.
6) Run it locally
Below, we run the model via Unsloth's native inference! We can also save this model locally.
And with that, we have fine-tuned Gemma 3 completely locally.
Of course, what we have discussed isn’t simple!
Gathering data is hard.
Fine-tuning needs compute power (although, yes, Unsloth massively reduces that through kernel-level optimizations).
And more.
If you want to upskill in LLM fine-tuning and learn how to do it well, don’t miss Predibase’s upcoming event.
You can learn Reinforcement Fine-tuning (with implementation and a demo) in their webinar on 27th March 2025.
And it’s a free event.
Register for the hands-on webinar to upskill with LLM fine-tuning here→
You can find the code for today’s issue in this Colab Notebook →
Thanks for reading!
P.S. For those wanting to develop “Industry ML” expertise:
At the end of the day, all businesses care about impact. That’s it!
Can you reduce costs?
Drive revenue?
Can you scale ML models?
Predict trends before they happen?
We have discussed several other topics (with implementations) that align with such topics.
Here are some of them:
Learn sophisticated graph architectures and how to train them on graph data: A Crash Course on Graph Neural Networks – Part 1.
So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here: Bi-encoders and Cross-encoders for Sentence Pair Similarity Scoring – Part 1.
Learn techniques to run large models on small devices: Quantization: Optimize ML Models to Run Them on Tiny Hardware.
Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust: Conformal Predictions: Build Confidence in Your ML Model’s Predictions.
Learn how to identify causal relationships and answer business questions: A Crash Course on Causality – Part 1
Learn how to scale ML model training: A Practical Guide to Scaling ML Model Training.
Learn techniques to reliably roll out new models in production: 5 Must-Know Ways to Test ML Models in Production (Implementation Included)
Learn how to build privacy-first ML systems: Federated Learning: A Critical Step Towards Privacy-Preserving Machine Learning.
Learn how to compress ML models and reduce costs: Model Compression: A Critical Step Towards Efficient Machine Learning.
All these resources will help you cultivate key skills that businesses and companies care about the most.