Visual Guide to Bi-encoders, Cross-encoders and ColBERT

Used in many real-world NLP systems.

Jun 12, 2025

Atlassian’s new Rovo Dev CLI brings AI right into the command line in a way that actually helps you ship:

Generate, refactor, and review code using plain English.
Debug without context switching.
Auto-generate docs from your commit history.
Pull in Jira tickets + Confluence context, all in one flow.

It’s seamlessly integrated with Atlassian tools, so you can go from idea → code → doc → deployment without leaving your flow.

Use Rovo Dev CLI Agent

Try Rovo Dev CLI here →

Thanks to Altassian for partnering today!

Visual Guide to Bi-encoders, Cross-encoders and ColBERT

So many real-world NLP systems, implicitly or explicitly, rely on pairwise sentence (or context) scoring in one form or another.

RAG systems
QA systems
Duplicate text detection systems, etc.

The visual depicts three popular approaches used in the industry to handle this:

Let’s understand them one-by-one!

We covered them with implementation here:
Bi-encoders and Cross-encoders for Sentence Pair Similarity Scoring.
AugSBERT for Sentence Pair Similarity Scoring.
A deep dive into ColBERT and ColBERTv2 for improving RAG systems (with implementation).

1) Cross-encoders

These are conceptually one of the most powerful approaches.

Concatenate the query text and the document text.
Encode it using a BERT-like encoder model.
Apply a transformation (a dense layer) to the [CLS] token representations to get a similarity score.

Since the model attends to both contexts, this produces an incredibly semantically expressive representation.

But it does not scale because if you have 1B documents, you must do 1B forward passes to determine the most relevant documents to a query.

2) Bi-encoders

Encode the query and the documents separately.
Compute the cosine similarity between the [CLS] token of the query and the document.

This is highly scalable since the document embeddings can be computed offline.

But we lose all the interaction and simply “hope” that the entire information about the query and the document is well summarized in the [CLS] token.

3) ColBERT

This brings together the power of cross-encoders and the scalability of bi-encoders.

Encode the query and the documents separately.
Compute a late interaction matrix, which contains similarity scores (dot product) between all query tokens and all document tokens.
For every token, determine the max score across all document tokens.
Sum these max scores to get a matching score.

Advantages:

Like bi-encoders, it is highly scalable since document embeddings can be computed offline.
Like cross-encoders, it maintains cross-interactions between the query and the document tokens (called late interaction).

In fact, ColPali (a method used in vision-driven RAG) is inspired by ColBERT.

We covered these three architectures with implementation here:

Over to you: What are some other advantages of ColBERT?

Thanks for reading!

P.S. For those wanting to develop “Industry ML” expertise:

At the end of the day, all businesses care about impact. That’s it!

Can you reduce costs?
Drive revenue?
Can you scale ML models?
Predict trends before they happen?

We have discussed several other topics (with implementations) that align with such topics.

Develop "Industry ML" Skills

Here are some of them:

Learn how to build Agentic systems in a crash course with 14 parts.
Learn how to build real-world RAG apps and evaluate and scale them in this crash course.

Learn sophisticated graph architectures and how to train them on graph data.
So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
Learn how to run large models on small devices using Quantization techniques.
Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
Learn how to scale and implement ML model training in this practical guide.
Learn techniques to reliably test new models in production.
Learn how to build privacy-first ML systems using Federated Learning.
Learn 6 techniques with implementation to compress ML models.

All these resources will help you cultivate key skills that businesses and companies care about the most.

Daily Dose of Data Science

Discussion about this post

Daily Dose of Data Science

Visual Guide to Bi-encoders, Cross-encoders and ColBERT

Used in many real-world NLP systems.

Transform your terminal into an Agent

Visual Guide to Bi-encoders, Cross-encoders and ColBERT

1) Cross-encoders

2) Bi-encoders

3) ColBERT

P.S. For those wanting to develop “Industry ML” expertise:

Discussion about this post