Introduction
In our final post on the challenges of enterprise RAG platforms, we want to step back and look at the high level assembly of services.
In years past, enterprise search discussions centered on full-text engines along with extracting and ingesting various forms of text from loose files. Modern platforms must concern themselves with a new landscape of search, machine learning, and more sophisticated engines. This includes combined search and data engines, text embedding models, hybrid search, and cutting edge LLMs for use in RAG implementations.
We’ll touch on these topics and note how they come together to enable a modern platform. We’ll also cover key ideas like the fast evolving LLM and RAG industry, and bring our series to a close.
Leveraging the right engine
Over the last ten years or more, there are many innovations which have emerged to help refresh the enterprise search industry. At the heart of the industry are the fundamental improvements in search and data engine technology.
Historically, search engines processed text files to create and manage full-text indexes. These engines provided search results with pointers back to the files and their job was mostly done. Now we have engines which provide a rich combination of document storage, metadata indexing, full-text indexes for lexical search, vectors and indexing for semantic search, and an array of ranking implementations.
For modern hybrid search platforms, one needs to account for all these features in various use cases. Some platforms leverage multiple engines and federate queries across them. A few select engines, and more over time, are tightly integrating such functionality into a single engine.
Even with the engine selection complete, there’s still the matter of leveraging modern embedding models for hybrid search. Indexing is no longer just tokenizing text and building lexical indexes for BM-25. Now we must process text for specialized input into embedding models, then store and create specialized indexes (ex: HNSW) for semantic search. Then, since lexical and semantic search each have their own tradeoffs, there’s the matter of combining them at match or ranking time for fully effective hybrid search.
Few engines bring all these matching and ranking features into one system, but the Vespa.ai system used in Atolio is a prime example of such innovations. It allows tight integrations at matching time, flexible ranking options, all while maintaining flexibility to mix and match as needed. Finally it has years and years of production experience driving the robustness and scalability needed in such a distributed platform.
Integrating models and evolving
Along with underlying engine improvements, in the past few years enterprise search has expanded in scope. Now platforms necessarily must provide both search and RAG functionality. Given that RAG often starts with Retrieval, this is a natural extension of search platforms.
Retrieval Augmented Generation (RAG) is fundamentally the process of taking a user query, searching for relevant content, and then sending that content into a Large Language Model (LLM) for generating a personalized response to the user. We’ve focused a lot on the search platform details, but what about the LLM options?
This is where flexibility comes into play. The LLM industry, relatively speaking, is still quite new. We’re all aware of the fast pace at which it moves. New models are emerging almost weekly, and each has a chance to bring broad improvements or improvements in specialized domains and use cases. In such a market, it is important that an enterprise doesn’t lock-in too early and miss out on new innovations next quarter.
This need for adaptability in an evolving market is why Atolio allows for integrating any of the modern LLMs or APIs into the RAG system. We can recommend and leverage high quality OpenAI models as a solid default option. However, you can also bring your own models or leverage our search API to plug top-notch retrieval into your own ML systems.
Putting it All Together
Once you have a solid search system and a good LLM at hand, you can bring it all together for your RAG implementations and use cases. Your search application will need business logic which is able to query the search platform. It will then need to process the results and format them as a prompt for the LLM. Next it can call the LLM you’ve selected and wrap it all up with a response back to the user.
Mechanically, this is fairly straightforward. However, there’s an endlessly deep well of work to be had in search relevance tuning, prompt engineering, and the combination of the two. Like the LLM industry, this RAG orchestration is also a fast moving area, and currently a bit more art than science.
Closing
As we’ve covered today, if you’re building your own RAG platform, you’ll need to select a solid search engine, select models for generating embeddings, and integrate a good LLM. Additionally, there will be the work of setting up, configuring, and managing the underlying services. Then there’s the evolving middleware and business logic for various RAG implementations.
This is enough work for several teams. A prototype project inside the enterprise will often find initial success, but then soon faces tons of engineering and operational challenges. Let Atolio help you find your way through the AI readiness world. Reach out for a low-risk discussion and trial!