Transfer Learning, Fine-tuning, Multitask Learning and Federated Learning

Four must-know model training paradigms.

Avi Chawla

Mar 26, 2025

Reserve A100s, H100s, and H200s at 50-80% off!

Lightning Studios now offers reserve-ahead for A100s, H100s, and H200s for 50-80% off!

Reserve by day up to a month.
Self-serve with pay-up-front billing.
Reservation prices offer up to 80% off.

Reserve GPUs at 50-80% off!

Reserve them here today →

Thanks to LightningAI for partnering today!

Transfer Learning, Fine-tuning, Multitask Learning and Federated Learning

Here are four such well-adopted and must-know training methodologies in ML:

The flow arrows in transfer learning, fine-tuning, and multi-task learning depict gradient flow, i.e., those weights are updated during training.

Let’s discuss them today.

#1) Transfer learning

This is useful when the task of interest has little data but a related task has abundant data.

This is how it works:

Train a neural network model (base model) on the related task.
Replace the last few layers on the base model with new layers.
Train the network on the task of interest, but don’t update the weights of the unreplaced layers during backpropagation.

By training a model on the related task first, we can capture the core patterns of the task of interest.

Later, we can adjust the last few layers to capture task-specific behavior.

Another idea along these lines is knowledge distillation, which involves the “transfer” of knowledge. We discussed it here if you are interested in learning about it.

Transfer learning is commonly used in many computer vision tasks.

#2) Fine-tuning

Fine-tuning involves updating the weights of some or all layers of the pre-trained model to adapt it to the new task.

The idea may appear similar to transfer learning, but in fine-tuning, we typically do not replace the last few layers of the pre-trained network.

Instead, the pretrained model itself is adjusted to the new data.

#3) Multi-task learning

A model is trained to perform multiple tasks simultaneously.

The model shares knowledge across tasks, aiming to improve generalization and performance on each task.

It can help in scenarios where tasks are related, or they can benefit from shared representations.

The motivation for multi-task learning is not just to improve generalization. Instead, due to shared layers, it also saves computing power and memory:

#4) Federated learning

Federated learning is a decentralized approach to training models.

Instead of sending data to a central server, models are sent to devices, trained locally, and only model updates are gathered and sent back to the server.

It is useful when privacy is important. It also reduces the need for centralized data collection.

The keyboard of a smartphone is a great example of this.

Federated learning allows the smartphone’s keyboard to learn and adapt to our typing habits. This happens without sending the data about sensitive keystrokes or personal data to a central server.

That said, since this model is trained on small devices, it also means that these models must be lightweight yet powerful to be useful.

We covered Federated learning in detail here: Federated Learning: A Critical Step Towards Privacy-Preserving Machine Learning.

👉 Over to you: What are some other ML training methodologies that I have missed here?

Thanks for reading!

P.S. For those wanting to develop “Industry ML” expertise:

At the end of the day, all businesses care about impact. That’s it!

Can you reduce costs?
Drive revenue?
Can you scale ML models?
Predict trends before they happen?

We have discussed several other topics (with implementations) that align with such topics.

Develop "Industry ML" Skills

Here are some of them:

Learn how to build Agentic systems in an ongoing crash course with 6 parts.
Learn how to build real-world RAG apps and evaluate and scale them in this crash course.

Learn sophisticated graph architectures and how to train them on graph data in this crash course.
So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
Learn how to run large models on small devices using Quantization techniques.
Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
Learn how to scale and implement ML model training in this practical guide.
Learn techniques to reliably test new models in production.
Learn how to build privacy-first ML systems using Federated Learning.
Learn 6 techniques with implementation to compress ML models.

All these resources will help you cultivate key skills that businesses and companies care about the most.

Daily Dose of Data Science

Discussion about this post