Hybrid search

Why is Hybrid search needed?

In Retrieval-Augmented Generation (RAG), the most common technique used during the retrieval stage is vector search, which identifies results based on semantic similarity. The process works by breaking down documents from an external knowledge base into semantically meaningful units, such as paragraphs or sentences, and then converting each unit into a numerical representation (multi-dimensional vectors) that can be processed by a computer. The same transformation is applied to the user’s query. This allows the system to recognize nuanced semantic connections between the query and the stored text. For example, the phrases “cats chase mice” and “kittens hunt mice” would be considered more closely related than “cats chase mice” and “I like eating ham.” Once the system finds the most relevant passages, it supplies them as context to the large language model, helping it formulate an accurate response.

Beyond enabling advanced semantic text retrieval, vector search offers several other benefits:

  • Recognizing related meanings (e.g., car/automobile/vehicle, YouTube/Vimeo/video platform)

  • Cross-language retrieval (e.g., matching English queries to Chinese content)

  • Multimodal matching (e.g., comparing text, images, audio, and video on a similarity basis)

  • Error tolerance (handling typos and imprecise queries)

However, vector search is less effective in certain scenarios, including:

  • Specific names: “Marie Curie”, “Tesla Model 3”

  • Retrieving abbreviations or short phrases: “NLP”, “HTML”

  • Finding exact IDsi: “AB-1234-XY”, “ver.2.4.7”

These are precisely the areas where traditional keyword search remains superior, particularly for:

  • Exact matches (e.g., product names, personal names, catalog numbers)

  • Very short queries (vector search struggles with minimal input, yet many users only type a few keywords)

  • Rare or low-frequency terms (often crucial to meaning, such as in “Would you like to have coffee with me?” where “coffee” and “have” are more significant than “you” and “like”)

For most text search scenarios, the primary goal is to ensure that the most relevant potential results appear in the candidate results. Vector search and keyword search each have their advantages in the retrieval field. Hybrid search combines the strengths of both search technologies and compensates for their weaknesses.In hybrid search, you need to establish vector indexes and keyword indexes in the database in advance. When a user query is input, the most relevant texts are retrieved from the documents using both retrieval methods.

“Hybrid search” does not have a precise definition. If in the future we'll use new combinations of search algorithms, we will still call it “hybrid search.” For instance, we've in the roadmap to use knowledge graph techniques for retrieving new powerful entity relationships. Different retrieval systems excel at finding various subtle relationships between texts (paragraphs, sentences, words), including exact relationships, semantic relationships, thematic relationships, structural relationships, entity relationships, temporal relationships, event relationships, etc. No single retrieval mode can be suitable for all scenarios. Hybrid search achieves complementarity between multiple retrieval technologies through the combination of multiple retrieval systems. But for the moment in Tiledesk we only implement a mixed semantic + fulltext approach to search.

Definition: Generating query embeddings and querying the text segments most similar to their vector representations.

Chunk limit (TopK): Used to filter the text fragments most similar to the user’s query. The system will dynamically adjust the number of fragments based on the context window size of the selected model. The default value is 3.

Full-text

Definition: Indexing all words in the document, allowing users to query any word and return text fragments containing those words.

Simultaneously performs full-text search and vector search, applying a re-ranking step to select the best results matching the user’s query from both types of query results.

You can control how much weight is given to keyword search compared to semantic search by adjusting the slider. Moving it towards the left prioritizes keyword search, and at the far left only keyword search will be used. Moving it towards the right prioritizes semantic search, and at the far right only semantic search will be used. This allows you to fine-tune the balance between the two methods to get the most relevant results for your needs.

Re-ranking

Enabling re-ranking greatly improves results quality providing at the same time less tokens usage in LLMs. How?

More chunks you get from your vector store more probability you have to find the most effective chunks replying to user questions. The alghoritm used by the vector search engine only provides a mathematical distance in terms of meaning-matching of the chunks compared to the user question word embeddings. Not always this distance computation retrieves the right chunks in relevance order. Sometimes the distance ordering put on top of your results the wrong chunks despite the fact that they are semantically the most relevant.

Another complementary problem is that you want to use as less chunks as possible to be passed to your LLM prompt. So you'll try to "cut" yout query to the minimum possible chunks, for example 5. But if you are unlucky, the best chunks will have been the 6th and 7th, the ones immediately after the last returned chunk.

The idea behind re-ranking is to ask for as much chunks as possible, suppose instead of 5 ask 50! Retrieving 5 or 50 chunks only increase of few milliseconds your vector store search. But once you have 50 chunks you can apply a ML model to find the best ones matching the user query with the goal of minimizing the number of chunks to use in the LLM prompt.

You can discover more about re-ranking here.

Last updated