Re-ranking
Re-ranking in Tiledesk: How It Works and Why It Matters

What is re-ranking (easy explanation)
Re-ranking is a second, smarter selection step applied after an initial search (e.g. vector search, keyword search, hybrid search).
First step – Retrieval The system retrieves a set of candidate results that are probably relevant to the user’s question (e.g. top 20 chunks from a vector database).
Second step – Re-ranking A more precise model evaluates each candidate in context of the actual user query and:
assigns a relevance score
sorts results from most to least relevant
optionally discards low-quality matches
In short:
Retrieval finds “possible answers” → Re-ranking finds the “best answers.”
Why re-ranking is needed
Vector similarity alone is powerful, but it has limitations:
It may retrieve semantically similar but contextually wrong chunks
It treats all candidates as equally good once retrieved
It cannot always understand intent, constraints, or priority
Re-ranking solves this by deeply comparing the user question with each candidate, instead of comparing embeddings only.
How re-ranking works
What re-ranking model evaluates
In Tiledesk, re-ranking is used in RAG pipelines to improve answer quality across:
Customer support assistants
Internal knowledge bases
Enterprise document search
Multi-agent workflows
Tiledesk allows re-ranking to be applied:
Automatically in RAG flows
As a configurable step in agent pipelines
In on-premise, hybrid, or cloud deployments
Intuitive use case
Scenario
A company uses Tiledesk to power a support assistant with:
Product manuals
Internal procedures
Troubleshooting guides
User question:
“How can I reset my device if it’s stuck during firmware update?”
Without Re-ranking
Vector search retrieves chunks like:
Firmware update overview
Reset device after factory test
Device troubleshooting – connection issues
Firmware version history
Reset device procedure (correct)
The LLM sees mixed context and may:
Answer partially
Mention irrelevant steps
Hallucinate missing details
With Re-ranking Enabled
The re-ranking model analyzes each chunk against the exact question and produces:
Reset device procedure (firmware recovery mode) ⭐⭐⭐⭐⭐
Troubleshooting – firmware stuck scenarios ⭐⭐⭐⭐
Firmware update overview ⭐⭐
Reset after factory test ⭐
Version history ⭐
Only the top, most relevant content is passed to the LLM.
Result for the User
More precise answer
Correct steps on the first try
Less confusion
Faster resolution
Benefits for user and organizations
1. Higher Answer Accuracy
Re-ranking significantly reduces:
Irrelevant context
Partial answers
Hallucinations
2. Better Use of Existing Knowledge
Even large or noisy knowledge bases become:
More reliable
Easier to maintain
More scalable
3. Improved User Trust
Users notice when:
Answers are consistent
Instructions are correct
The assistant “understands” intent
This leads to:
Higher adoption
Lower fallback to human support
4. Cost and Performance Optimization
By sending only the best chunks to the LLM:
Fewer tokens are used
Responses are faster
Costs are reduced
5. Enterprise-Grade Control
In Tiledesk, re-ranking supports:
On-premise and GDPR-compliant deployments
Integration with custom retrieval logic
How to enable Re-ranking?
Move to the Knowledge Bases section and press + New Knowledge Base button, then choose the "Hybri search" option

Once the Knowledge base is created you can create and connect an AI Agent directly to it

In the AI Agent flow you can also decide to enable/disable the re-ranking for a specific Ask Knowledge Base Action

When you should enable re-ranking?
Re-ranking is especially valuable when:
Knowledge bases are large (hundreds or thousands of documents)
Documents are similar to each other
Precision matters (legal, technical, industrial domains)
Users ask complex or multi-constraint questions
Re-ranking uses GPUs
In Tiledesk, re-ranking is designed for enterprise-grade, real-time precision, which makes it suitable primarily for on-prem GPU installations.
Re-ranking relies on cross-encoder models that must score many (query, chunk) pairs in parallel, a workload that is computationally intensive and latency-sensitive. Running this step on CPUs introduces unpredictable delays.
An on-prem GPU allows Tiledesk to execute re-ranking locally, with stable low latency, full data sovereignty, and predictable performance under load, making it the only deployment model that consistently meets enterprise SLAs and compliance requirements.
In our SAAS deployment we extensively use GPUs for Hybrid-search and Re-ranking.
Last updated