Over the last couple of weeks, we covered several details around vector databases, LLMs, and fine-tuning LLMs with LoRA. Moreover, we implemented LoRA from scratch, learned about RAG, its considerations, and much more.
If you are new here (or wish to recall), you can read this:
There’s one thing that’s yet to be addressed in this series of articles.
To recall, there are broadly two popular ways to augment LLMs with additional data:
RAG
Fine-tuning using LoRA/QLoRA
Both of them have pros and cons, and different applications.
The question is:
Under what conditions does it make sense to proceed with RAG and when should one prefer fine-tuning?
To continue this LLM series, I’m excited to bring you a special guest post by Damien Benveniste. He is the author of The AiEdge newsletter and was a Machine Learning Tech Lead at Meta.
Subscribe to Damien's The AiEdge newsletter for more. You can also follow him on LinkedIn and Twitter.
In today’s machine learning deep dive, he is providing a detailed discussion on RAG vs. Fine-tuning: Augmenting LLMs: Fine-Tuning or RAG?
More specifically, he explains the tradeoffs between:
RAG and fine-tuning
System design for RAG and fine-tuning pipelines
Cost measures of owning the model vs. using a third-party host.
Issues with RAG and fine-tuning.
I personally learned a lot from this one, and I am sure you will learn a lot too.
Please read it here: Augmenting LLMs: Fine-Tuning or RAG?
Have a good day!
Avi