Technique To Improve RAG Quality by Ex Google and Microsoft Scientist

Vinodh Kumar Ravindranath is an IISc postgrad who has worked at Google and Microsoft, and now is the Head of AI at eightfold.ai.

He introduces a novel algorithm called SOAR (Spilling with Orthogonality-Amplified Residuals) to improve the quality of Retrieval-Augmented Generation (RAG) systems. RAG, widely used in the generative AI application space, relies heavily on retrieving the most relevant documents to a user query. The traditional bottleneck in RAG systems is the retrieval step, which can fail to fetch the right document, leading to suboptimal results. SOAR addresses this issue by incorporating redundancy in document clustering, ensuring more accurate retrieval and better performance.

How RAG Works:
- Retrieve Relevant Documents: Retrieve top-k documents relevant to the user query through vector embedding similarity.
- Generate Results: Forward these documents along with the user query to the LLM to generate the final response.

The Issue:
The primary bottleneck in RAG systems is the retrieval step. If the correct document is not retrieved, the quality of the generated response suffers.

SOAR: The Solution
SOAR introduces redundancy in document clustering to improve retrieval accuracy. The algorithm assigns each document to multiple clusters with orthogonal representatives, ensuring that the most relevant documents are more likely to be retrieved.

How SOAR Works
1. Vector Search: Documents and queries are mapped to embeddings.
2. Clustering: Documents are clustered, and cluster representatives are chosen.
3. Query Phase:
- Representative Comparison: Query is compared with cluster representatives to choose the top-k clusters.
- Cluster-wide Comparison: All document vectors in the top-k clusters are compared with the query to retrieve the top-n documents for response generation.
4. Redundancy with Orthogonality: Each document is assigned to additional clusters whose representatives are orthogonal to the original cluster representative. This increases the likelihood of retrieving the most relevant document.

Benefits of SOAR
- Improved Retrieval Accuracy: By using redundancy, SOAR ensures that the most relevant documents are more likely to be retrieved.
- Enhanced Efficiency: The orthogonality in clustering reduces the chances of missing relevant documents during retrieval.

SOAR is an ingenious use of redundancy to enhance the efficiency and accuracy of retrieval in RAG systems. This simple yet elegant technique leverages orthogonality in document clustering to significantly improve the quality of generated responses in generative AI applications.