The Right Way to Use Multiple Embedding Models

A mistake that often goes unnoticed.

Jul 13, 2024

Imagine you have two different models (or sub-networks) in your whole ML pipeline. Both generate a representation/embedding of the input in the same dimensions (say, 200).

Two networks generate embeddings of the same dimensions

These could also be pre-trained models used to generate embeddings — BERT, XLNet, etc., or even through any embedding network for that matter.

Here, many folks get tempted to make them interact. They would:

compare these representations
compute their Euclidean distance
compute their cosine similarity, and more.

The rationale is that as the representations have the same dimensions, they can seamlessly interact.

However, that is NOT true, and you should NEVER do that.

Why?

This is because even though these embeddings have the same length (or dimensions), they are not in the same space, i.e., they are out of space.

Out of space means that their axes are not aligned.

To simplify, imagine both embeddings were in a 3D space.

Now, assume that their z-axes are aligned, but the x and y axes of the first is at an angle to the x and y axes of the second:

Now, of course, both embeddings have the same dimensions — 3.

But can you compare them?

No, right?

Similarly, comparing the embeddings from the two networks above would inherently assume that all axes are perfectly aligned.

But this is highly unlikely because there are infinitely many ways axes may orient relative to each other.

Thus, the representations can NEVER be compared, unless generated by the same model.

I vividly remember making this mistake once, and it caused serious trouble in my ML pipeline.

And I think if you are not aware of this, then it is something that can easily go unnoticed.

Instead, I have always found that concatenation is a much better way to leverage multiple embeddings.

The good thing is that concatenation works even if they have unequal dimensions.

👉 Over to you: How do you typically handle embeddings from multiple models?

Are you overwhelmed with the amount of information in ML/DS?

Every week, I publish no-fluff deep dives on topics that truly matter to your skills for ML/DS roles.

I want to read super-detailed articles

For instance:

Join below to unlock all full articles:

I want to read super-detailed articles

SPONSOR US

Get your product in front of 82,000 data scientists and other tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., who have influence over significant tech decisions and big purchases.

To ensure your product reaches this influential audience, reserve your space here or reply to this email to ensure your product reaches this influential audience.

Ahmed Besbes

Jul 13, 2024Edited

you could retrain the two embedding models jointly by minimzing the L2 distance between the embeddings of the same input and maximizing it for different inputs.

this training can be achieved using a contrastive loss.

Expand full comment

Eddy Giusepe

Jul 15, 2024

That's an interesting point... Thank you, AVI CHAWLA !

1 more comment...

Daily Dose of Data Science

Discussion about this post