# Re-ranking

<figure><img src="/files/b0QZ2wtZqRmxQBabUGQ3" alt=""><figcaption></figcaption></figure>

## What is re-ranking (easy explanation)

Re-ranking is a second, smarter selection step applied after an initial search (e.g. vector search, keyword search, hybrid search).

1. **First step – Retrieval**\
   The system retrieves a *set of candidate results* that are *probably relevant* to the user’s question\
   (e.g. top 20 chunks from a vector database).
2. **Second step – Re-ranking**\
   A more precise model evaluates each candidate in context of the actual user query and:
   * assigns a relevance score
   * sorts results from most to least relevant
   * optionally discards low-quality matches

In short:

> **Retrieval finds “possible answers” → Re-ranking finds the “best answers.”**

## Why re-ranking is needed

Vector similarity alone is powerful, but it has limitations:

* It may retrieve **semantically similar but contextually wrong** chunks
* It treats all candidates as equally good once retrieved
* It cannot always understand intent, constraints, or priority

Re-ranking solves this by deeply comparing the user question with each candidate, instead of comparing embeddings only.

## How re-ranking works

```
User Question
     ↓
Initial Retrieval (Vector / Hybrid Search)
     ↓
Top N Candidates (e.g. 20)
     ↓
Re-ranking Model
     ↓
Top K Best Chunks matches (e.g. 5)
     ↓
LLM Answer Generation
```

## What re-ranking model evaluates

In Tiledesk, re-ranking is used in **RAG pipelines** to improve answer quality across:

* Customer support assistants
* Internal knowledge bases
* Enterprise document search
* Multi-agent workflows

Tiledesk allows re-ranking to be applied:

* Automatically in RAG flows
* As a configurable step in agent pipelines
* In on-premise, hybrid, or cloud deployments

## Intuitive use case

#### Scenario

A company uses Tiledesk to power a support assistant with:

* Product manuals
* Internal procedures
* Troubleshooting guides

**User question:**

> “How can I reset my device if it’s stuck during firmware update?”

***

#### Without Re-ranking

Vector search retrieves chunks like:

1. Firmware update overview
2. Reset device after factory test
3. Device troubleshooting – connection issues
4. Firmware version history
5. Reset device procedure (correct)

The LLM sees mixed context and may:

* Answer partially
* Mention irrelevant steps
* Hallucinate missing details

***

#### With Re-ranking Enabled

The re-ranking model analyzes each chunk **against the exact question** and produces:

1. Reset device procedure (firmware recovery mode) ⭐⭐⭐⭐⭐
2. Troubleshooting – firmware stuck scenarios ⭐⭐⭐⭐
3. Firmware update overview ⭐⭐
4. Reset after factory test ⭐
5. Version history ⭐

Only the top, most relevant content is passed to the LLM.

#### Result for the User

* More precise answer
* Correct steps on the first try
* Less confusion
* Faster resolution

## Benefits for user and organizations

#### 1. Higher Answer Accuracy

Re-ranking significantly reduces:

* Irrelevant context
* Partial answers
* Hallucinations

#### 2. Better Use of Existing Knowledge

Even large or noisy knowledge bases become:

* More reliable
* Easier to maintain
* More scalable

#### 3. Improved User Trust

Users notice when:

* Answers are consistent
* Instructions are correct
* The assistant “understands” intent

This leads to:

* Higher adoption
* Lower fallback to human support

#### 4. Cost and Performance Optimization

By sending **only the best chunks** to the LLM:

* Fewer tokens are used
* Responses are faster
* Costs are reduced

#### 5. Enterprise-Grade Control

In Tiledesk, re-ranking supports:

* On-premise and GDPR-compliant deployments
* Integration with custom retrieval logic

## How to enable Re-ranking?

Move to the Knowledge Bases section and press + New Knowledge Base button, then choose the "Hybri search" option

<figure><img src="/files/Y8bxKCAbzwvax5phwnHR" alt=""><figcaption></figcaption></figure>

Once the Knowledge base is created you can create and connect an AI Agent directly to it

<figure><img src="/files/o2EHpuW2LZKdP6vjTbc0" alt=""><figcaption></figcaption></figure>

In the AI Agent flow you can also decide to enable/disable the re-ranking for a specific Ask Knowledge Base Action

<figure><img src="/files/tyKXOb3TFw9xStJXDdnk" alt=""><figcaption></figcaption></figure>

## When you should enable re-ranking?

Re-ranking is especially valuable when:

* Knowledge bases are large (hundreds or thousands of documents)
* Documents are similar to each other
* Precision matters (legal, technical, industrial domains)
* Users ask complex or multi-constraint questions

## Re-ranking uses GPUs

In Tiledesk, re-ranking is designed for enterprise-grade, real-time precision, which makes it suitable primarily for **on-prem GPU installations**.

Re-ranking relies on *cross-encoder* models that must score many *(query, chunk)* pairs in parallel, a workload that is computationally intensive and latency-sensitive. Running this step on CPUs introduces unpredictable delays.

An on-prem GPU allows Tiledesk to execute re-ranking models locally, with stable low latency, full data sovereignty, and predictable performance under load, making it the only deployment model that consistently meets enterprise SLAs and compliance requirements.

In our SAAS deployment we extensively use GPUs for Hybrid-search and Re-ranking.

### Re-ranking models

On our production enging we actually rely on local cross-encoder/ms-marco-MiniLM-L-6-v2 by default (optionally opting for some specific needs on the heavier bge-reranker-v2-m3)&#x20;

Evaluating TEI too with BAAI/bge-reranker-larg.

For Pinecone we opted for cohere-rerank-3.5, bge-reranker-v2-m3, pinecone-rerank-v0


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://guide.tiledesk.com/ai-chatbots-and-automation/knowledge-base/re-ranking.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
