Enhancing AI with Context: RAG as a Service

Introduction

In the evolving landscape of artificial intelligence, Retrieval-Augmented Generation (RAG) systems are rapidly gaining prominence as they bridge the gap between general-purpose large language models (LLMs) and domain-specific knowledge. Iris.ai has harnessed this powerful technology to offer a multi-layered RAG as a Service (RAGaaS), empowering organizations to enhance the capabilities of generative AI systems by integrating context-specific document retrieval with advanced language processing.

What is RAG?

RAG combines three cutting-edge technologies: retrieval, augmentation and generation. While large language models (LLMs) are great at generating coherent and contextually rich content, they sometimes struggle to produce factually accurate information, especially when it comes to niche or recent topics. This is where retrieval-augmented generation shines.

RAG framework enriches the responses of language models by leveraging external knowledge sources, such as research papers, internal documentation, or other domain-specific content. The core concept of RAG is simple: The process begins by converting both the query and the documents into an indexable content for example - embeddings—vector representations that encapsulate their semantic meaning. The Retrieval phase identifies and pulls in relevant documents with essential, context-rich information. In the Augmentation phase, this data provides the model with valuable context, setting the stage for an accurate and informed response. Finally, in the Generation stage, the model produces a response that combines its learned patterns with the retrieved insights, reducing hallucinations and improving traceability by linking back to original sources.

What is Multi-RAG?

Our Multi-RAG system is more than a vector database. Most well-known RAG systems typically rely on cosine similarity to retrieve documents based on vector embeddings. While these systems can retrieve documents accurately in many use cases, they often require significant context and might not be optimal for short questions or nuanced deep domain-specific questions for business applications. In order to make RAGaaS more applicable to such applications Iris.ai breaks through this limitation by employing optimized, domain-adapted embeddings and a dynamic approach to similarity metrics, which significantly improves the usual approach. Our solution goes even further by employing multiple retrieval methods to provide accurate, context-aware answers.

Enriched Embeddings: A core differentiator of Iris.ai’s RAG system is its use of rich, optimized embeddings that capture subtle distinctions within a given domain. Unlike basic token embeddings common in other systems, our approach creates domain-specific embeddings, which deliver far better retrieval accuracy in specialized fields. This capability allows businesses to deploy RAGaaS in applications where granular differentiation is critical for high-quality responses.

Advanced Similarity Metrics: Unlike the standard cosine similarity, Iris.ai employs the RV coefficient, an advanced metric suited to identifying nuanced similarities within domain-specific queries. The RV coefficient offers more precise document differentiation in a pool of many highly relevant documents. It ensures the RAG system provides more accurate responses even within dense domain-specific areas, which contrasts with cosine similarity which is better in differentiating sparse collections.

How Iris.ai Multi-RAG as a Service Works

Iris.ai’s RAGaaS combines various retrieval techniques, including vector database search, graph traversal for entity relationships, keyword search, and document fingerprinting. Each retrieval approach is automatically selected based on the nature of the query, enhancing response accuracy and processing speed.

The retrieval process involves embedding both the query and documents into suitable representations, ranking them through the chosen similarity metric, and using advanced document indexing to further filter out irrelevant data. Our system's dynamic approach ensures that the model operates efficiently even at scale, handling extensive data sources such as scientific papers, patents, or internal documentation.

Four-Layered Retrieval Approach

Vector Database Approach: Standard embedding-based search but enhanced with domain-specific, high-granularity embeddings.
Graph Traversal: Ideal for entity-based queries, allowing the system to traverse relationships and retrieve structured data.
Fingerprinting: Each document is “fingerprinted” for fast domain specialized semantic retrieval.
Keyword Search: As a fallback, keyword search ensures that simple, direct queries are answered quickly if other methods are suboptimal.

A Decision-Making System: Our RAGaaS doesn’t rely on a single retrieval approach. Instead, it uses query-type identification and input classification to select the best retrieval method dynamically, adapting seamlessly to short or long questions, entity-specific queries, or general overviews.

Fingerprinting and Advanced Document Retrieval

The fingerprinting process is central to Iris.ai’s RAG system. It involves extracting essential keywords, topics, and concepts from documents during the ingestion phase. These “fingerprints” act as metadata, which the system can quickly search through before delving into the vectorized representations. By first narrowing down the number of documents through fingerprinting, the system ensures that subsequent searches are fast and precise.

This layered approach allows Iris.ai’s RAG system to handle diverse types of documents, whether they are research articles with complex terminology or business emails with simple language. The adaptability of the system ensures optimal performance across different domains.

Augmentation

RAGaaS delivers comprehensive context augmentation, enriching answers with nuanced processing that ensures factual grounding. Iris.ai's approach involves sophisticated input processing, precise context-building, and careful response post-processing to ensure accuracy and relevance. Key processes include:

Input Classification and Processing

Inputs are first classified and encoded or indexed based on the type of information and relevance. This initial step leverages hybrid systems to direct inputs appropriately, maintaining alignment between user intent and the information presented.

Contextual Chunking

Inputs are split into smaller, meaningful units or "chunks" using “semantic slicing”, ensuring the model's context limit is respected without losing semantic integrity.

Response Post-Processing

After generation, answers undergo post-processing to ensure relevance and accuracy. When answers lack necessary information, the system presents no context as an answer instead presenting a highly suboptimal one.

Why choose Iris.ai

Iris.ai’s Retrieval-Augmented Generation as a Service (RAGaaS) offers several key benefits for businesses and organizations seeking to enhance their AI capabilities, particularly for document processing and information retrieval. Here are the primary benefits:

Adaptability

Iris.ai’s system offers dynamic retrieval methods that are customized to handle varying query types, ensuring precision across diverse use cases. This multi-layered approach outperforms traditional vector-only systems, delivering accurate answers where generic RAG systems fall short.

Contextual Accuracy

By retrieving highly relevant documents before generating a response, the RAG system significantly improves the contextual accuracy of the answers provided by large language models. This ensures that AI outputs are based on real, traceable information from the appropriate sources.

Traceability

One of the standout features of Iris.ai’s RAGaaS is the ability to provide traceable answers. Users can verify the sources of the information used by the model, enhancing trust and reliability in AI-generated responses and reducing the risk of hallucination.

Domain-Specific

Iris.ai’s RAGaaS excels in providing detailed, accurate responses tailored to business-specific applications. This capability extends beyond general LLMs, allowing companies to deploy AI effectively in highly regulated or knowledge-intensive fields.

Flexibility and Customization

The RAGaaS API allows for flexible integrations with different AI models. Clients can either use Iris.ai’s own models or integrate their proprietary models with the RAG system. This flexibility enables organizations to tailor the system to meet their specific needs.

Security and Data Privacy

Iris.ai’s RAG system is designed with stringent security protocols to ensure that client data remains confidential. Unlike many systems that rely on public LLMs like ChatGPT, Iris.ai ensures that sensitive information is never exposed to external networks. Designed with data privacy and deployment flexibility in mind, RAGaaS can be implemented on private cloud, on-premises, or SaaS platforms.

Use Cases: From Internal Documentation to Transcript Analysis

The potential applications of RAGaaS are vast. Companies with extensive internal documentation can leverage RAGaaS to streamline information retrieval, making it easier for employees to find relevant documents and navigate large datasets.

RAGaaS is adept at handling complex client data across multiple formats. With integrated Optical Character Recognition (OCR) capabilities, the system can convert various types of unstructured data from scanned documents and images into searchable and editable text. With other Iris.ai services, the user can also easily filter out documents and extract the table data in documents to spreadsheets.

RAG in Iris.ai Ecosystem

RAG as a Service seamlessly integrates with RSpace™ Core – an all-in-one platform for managing, analyzing, and creating knowledge. With RAG embedded into RSpace™, users can access relevant research directly within their workspace, enhancing productivity and reducing the time spent on repetitive tasks. This integration is featured in our Chat Tool, which allows users to retrieve precise answers with references—read more about how it works in our blog post.

Summary

The market for RAG as a service is growing, with various players offering solutions for retrieval-augmented generation. However, Iris.ai stands out due to its accuracy, security, and multi-layered approach. While many open-source RAG implementations struggle to handle large datasets and lack the traceability required by enterprises, Iris.ai has developed a robust system that excels in both performance and flexibility. The combination of advanced document fingerprinting, context-aware retrieval, and customizable APIs positions Iris.ai’s RAGaaS as a leader in the market.

In conclusion, Iris.ai’s RAG as a Service offers a powerful and secure solution for organizations looking to integrate advanced document retrieval with large language models. With a strong focus on accuracy, traceability, and flexibility, Iris.ai’s approach ensures that clients can harness the full potential of AI while maintaining control over their data and workflows. Whether your organization is managing scientific research, internal documentation, or customer interaction transcripts, RAGaaS is the solution that will take your generative AI capabilities to the next level.