Re-ranking

Re-ranking in Tiledesk: How It Works and Why It Matters

What is re-ranking (easy explanation)

Re-ranking is a second, smarter selection step applied after an initial search (e.g. vector search, keyword search, hybrid search).

First step – Retrieval The system retrieves a set of candidate results that are probably relevant to the user’s question (e.g. top 20 chunks from a vector database).
Second step – Re-ranking A more precise model evaluates each candidate in context of the actual user query and:
- assigns a relevance score
- sorts results from most to least relevant
- optionally discards low-quality matches

In short:

Retrieval finds “possible answers” → Re-ranking finds the “best answers.”

Why re-ranking is needed

Vector similarity alone is powerful, but it has limitations:

It may retrieve semantically similar but contextually wrong chunks
It treats all candidates as equally good once retrieved
It cannot always understand intent, constraints, or priority

Re-ranking solves this by deeply comparing the user question with each candidate, instead of comparing embeddings only.

How re-ranking works

User Question
     ↓
Initial Retrieval (Vector / Hybrid Search)
     ↓
Top N Candidates (e.g. 20)
     ↓
Re-ranking Model
     ↓
Top K Best Chunks matches (e.g. 5)
     ↓
LLM Answer Generation

What re-ranking model evaluates

In Tiledesk, re-ranking is used in RAG pipelines to improve answer quality across:

Customer support assistants
Internal knowledge bases
Enterprise document search
Multi-agent workflows

Tiledesk allows re-ranking to be applied:

Automatically in RAG flows
As a configurable step in agent pipelines
In on-premise, hybrid, or cloud deployments

Intuitive use case

Scenario

A company uses Tiledesk to power a support assistant with:

Product manuals
Internal procedures
Troubleshooting guides

User question:

“How can I reset my device if it’s stuck during firmware update?”

Without Re-ranking

Vector search retrieves chunks like:

Firmware update overview
Reset device after factory test
Device troubleshooting – connection issues
Firmware version history
Reset device procedure (correct)

The LLM sees mixed context and may:

Answer partially
Mention irrelevant steps
Hallucinate missing details

With Re-ranking Enabled

The re-ranking model analyzes each chunk against the exact question and produces:

Reset device procedure (firmware recovery mode) ⭐⭐⭐⭐⭐
Troubleshooting – firmware stuck scenarios ⭐⭐⭐⭐
Firmware update overview ⭐⭐
Reset after factory test ⭐
Version history ⭐

Only the top, most relevant content is passed to the LLM.

Result for the User

More precise answer
Correct steps on the first try
Less confusion
Faster resolution

Benefits for user and organizations

1. Higher Answer Accuracy

Re-ranking significantly reduces:

Irrelevant context
Partial answers
Hallucinations

2. Better Use of Existing Knowledge

Even large or noisy knowledge bases become:

More reliable
Easier to maintain
More scalable

3. Improved User Trust

Users notice when:

Answers are consistent
Instructions are correct
The assistant “understands” intent

This leads to:

Higher adoption
Lower fallback to human support

4. Cost and Performance Optimization

By sending only the best chunks to the LLM:

Fewer tokens are used
Responses are faster
Costs are reduced

5. Enterprise-Grade Control

In Tiledesk, re-ranking supports:

On-premise and GDPR-compliant deployments
Integration with custom retrieval logic

How to enable Re-ranking?

Move to the Knowledge Bases section and press + New Knowledge Base button, then choose the "Hybri search" option

Once the Knowledge base is created you can create and connect an AI Agent directly to it

In the AI Agent flow you can also decide to enable/disable the re-ranking for a specific Ask Knowledge Base Action

When you should enable re-ranking?

Re-ranking is especially valuable when:

Knowledge bases are large (hundreds or thousands of documents)
Documents are similar to each other
Precision matters (legal, technical, industrial domains)
Users ask complex or multi-constraint questions

Re-ranking uses GPUs

In Tiledesk, re-ranking is designed for enterprise-grade, real-time precision, which makes it suitable primarily for on-prem GPU installations.

Re-ranking relies on cross-encoder models that must score many (query, chunk) pairs in parallel, a workload that is computationally intensive and latency-sensitive. Running this step on CPUs introduces unpredictable delays.

An on-prem GPU allows Tiledesk to execute re-ranking models locally, with stable low latency, full data sovereignty, and predictable performance under load, making it the only deployment model that consistently meets enterprise SLAs and compliance requirements.

In our SAAS deployment we extensively use GPUs for Hybrid-search and Re-ranking.

Re-ranking models

On our production enging we actually rely on local cross-encoder/ms-marco-MiniLM-L-6-v2 by default (optionally opting for some specific needs on the heavier bge-reranker-v2-m3)

Evaluating TEI too with BAAI/bge-reranker-larg.

For Pinecone we opted for cohere-rerank-3.5, bge-reranker-v2-m3, pinecone-rerank-v0

PreviousHybrid search NextHow does the Knowledge Base work

Last updated 1 month ago

hashtagWhat is re-ranking (easy explanation)

hashtagWhy re-ranking is needed

hashtagHow re-ranking works

hashtagWhat re-ranking model evaluates

hashtagIntuitive use case

hashtagScenario

hashtagWithout Re-ranking

hashtagWith Re-ranking Enabled

hashtagResult for the User

hashtagBenefits for user and organizations

hashtag1. Higher Answer Accuracy

hashtag2. Better Use of Existing Knowledge

hashtag3. Improved User Trust

hashtag4. Cost and Performance Optimization

hashtag5. Enterprise-Grade Control

hashtagHow to enable Re-ranking?

hashtagWhen you should enable re-ranking?

hashtagRe-ranking uses GPUs

hashtagRe-ranking models