Playback speed
×
Share post
Share post at current time
0:00
/
0:00

[Hands-on] RAG over Excel Sheets

...using Docling and Llama-3.2 (100% local).

Since many of you like when demos, let's show you how we built a RAG app over Excel sheets using Docling and Llama-3.2.

  • Docling is an open-source library for handling complex docs.

  • Llama-3.2 is a powerful open-weight LLM.

The video above depicts the final outcome (the code is linked later).

Let's build it now.


Step 1) Parse file using Docling:

Docling uses two models:

  1. Layout analysis model to identify page elements,

  2. TableFormer for structure recognition model.

It also nicely integrates with LlamaIndex and exports data to the desired format with ease and speed.

We load the Excel using Docling as follows:

Step 2) Set up LLM and embedding model

Since our knowledge base is ready, we set up the LLM and embedding model.

  • We use Ollama to run the LLM locally.

  • And an embedding model from HuggingFace.

Step 3) Embed data and create an index

We pass the documents loaded documents (docs = loader.load_data()) and create a vector store below:

Step 4) Define a query engine and chat

Finally, we define a query_engine and start asking questions about our Excel data:

Done!

There’s some streamlit part we have shown here, but after building it, we get this clear and neat interface:

We hope this was a good place to start with RAG over complex docs!

We'll cover more advanced techniques very soon in our RAG crash course, specifically around Agentic RAG, vision RAG, etc.

Here's what we have covered so far in the first seven parts:

You can find all the code and instructions for today's demo in this GitHub repo: RAG with Docling.

👉 Over to you: What other topics would you like to learn about?

Thanks for reading Daily Dose of Data Science! Subscribe below and receive a free data science PDF (530+ pages) with 150+ core data science and machine learning lessons.


[REMINDER] Only 2 days left to unlock the best of DailyDoseofDS

This is a reminder that lifetime access to Daily Dose of Data Science is available at 30% off.

The offer ends in 3 days. Join here: Lifetime membership.

Become a lifetime member

Here's what you'll get:

It gives you lifetime access to the no-fluff, industry-relevant, and practical DS and ML resources that help you succeed and stay relevant in these roles:

  • Our recent 7-part crash course on building RAG systems.

  • LLM fine-tuning techniques and implementations.

  • Our crash courses on graph neural networks, PySpark, model interpretability, model calibration, causal inference, and more.

  • Scaling ML models with implementations.

  • Building privacy-preserving ML systems.

  • Mathematical deep dives on core DS topics, clustering, etc.

  • From-scratch implementations of several core ML algorithms.

  • Building 100% reproducible ML projects.

  • 50+ more existing industry-relevant topics (usually over 20 mins read covering several details).

  • Also, all weekly deep dives that we will publish in the future are included.

Join below at 30% off: Lifetime membership.

Become a lifetime member

Our next price drop will not happen any sooner than 8-9 months. If you find value in this work, it is a great time to upgrade to a lifetime experience.

P.S. If you are an existing monthly or yearly member and wish to upgrade to lifetime, please reply to this email.

Thanks and have a good day, and we'll see you tomorrow with our regular newsletter issue!

- Avi and Akshay

Discussion about this podcast