Permissions and Privacy in an Enterprise RAG Platform

Dave Cliffe

Head of RAG (Rendering AI Guidance) at Atolio

Introduction

We’re deep into our series on the challenges of RAG in the enterprise, and now it’s time to discuss the quiet concern lurking in the minds of engineers and executives alike.  The concern is privacy, permissions, and governance.  

If you’re onboarding a new HR intern, and they start searching for HR benefits information, will they accidentally see the restricted spreadsheet where you’re sorting out pay bands?  Can the lead engineer also search across private messages from other teams?  What happens if the internal emails of the executive team are fed into an LLM when someone asks an innocent question in your new RAG system?

Permissions and related topics present a perfect storm for enterprise search and RAG.  It’s where you’ll find the intersection of corporate risk and technical complexity.  This is why so many initial RAG systems focus only on publicly available information like support documents or public web content.  Today, we’ll highlight some of the challenges in this area.

Ensuring adherence to corporate data controls

There’s plenty of value to be had when using search and RAG systems, for users to be able to search across a variety of sources such as documents, tickets, email, and their chat messages.  What was that customer name my sales rep just sent me in Teams?  What is the executive summary of the recent security incident, described in both tickets and the post-mortem email from the head of support?

So on the engineering side, we need to ensure any given employee can search all the sources to which they have read permissions.  At the same time we need to ensure they have no ability to see or infer the items for which they do not have permission.

On top of that, permissions are never simple.  Sometimes a user has explicit access, and other times they have access by virtue of group memberships.  These memberships can be surprisingly complex.  Additionally, these read permissions are rarely static. Instead they are constantly evolving as administrators and managers make changes in the applications.

How can we ensure our enterprise data platform ingests and aggregates all these sources, while respecting the corresponding landscape of permissions?

Connectors, Users, and Groups

In most enterprises, there’s a central identity management system.  These systems are the authoritative source for defining and managing the company users and groups.  Common examples include Okta, Microsoft Active Directory variants, and the user administration in Google Workspace.  Any solid platform will need to anchor its permission system on these services, knowing how to read the users, groups, and ongoing changes.

Next, for each given source (ex: GitHub, MS Teams, Gmail), the connectors must be able to ingest those source-level users and groups.  The real complexity begins here, as some systems will link back to the central identity service explicitly.  Some will link back implicitly, such as through common email addresses.  Finally, some will be isolated completely, while still referencing the same underlying employees.

This is a nuanced and complicated area, with many challenges for the engineering team that is building your connectors and ingestion system.  When a new ticket note comes in from the Jira connector, it must be mapped to the Atlassian user as well as the Okta user where possible.  Then some representation of that user must be carried along with the content.  When that content is ingested by your search platform, it must always have some form of association back to the user.

Last but not least, any ongoing changes must flow through this complex system.  If you’ve made a mistake when adding Sally to that Confluence Pages group, and you correct that mistake, it should also be reflected in her search results downstream.  The same holds true when employees change departments, leave the company, or get promoted.  Your array of connectors and ingestion code must all coordinate to ensure the underlying system adheres to this array of permissions and changes.

Ensuring fine-grained permissions at search time

Ok, your data model, connectors, and ingestion system are in place.  Now it’s time to bring it all together at search time.

For this engineering challenge, there are several broad categories of solutions.  Some search systems might pre-compute which users can read which pieces of content, and write each possibility directly into the engine.  Some might take a hybrid approach and join the content to the users and groups at search matching time.  Others might leverage a filtering system downstream of search results.  No approach is perfect, and it takes deep experience to understand the various tradeoffs for performance, resource usage, and adherence to corporate requirements.

At Atolio we’ve spent significant amounts of time and effort baking this functionality into our platform.  It retains performance at scale while also ensuring adherence to the complex array of permissions in sources and identity systems.  Finally, it respects updates as they flow from upstream systems into the search and RAG platform.

Closing

While the intersection of permissions and enterprise search can be complex, it’s not an intractable problem.  The benefits you get as a result of solving this problem are quite significant too.  Search across sources, time, and your network, all while respecting corporate controls. This unlocks complex use cases like synthesis and summarization in RAG platforms, and enables AI efficiencies for your workforce.

If you’d like to take advantage of this work, we can discuss a low risk trial.  Take advantage of our work and advance your AI readiness plan by reaching out!

Dave Cliffe is the Head of RAG (Rendering AI Guidance) at Atolio. Atolio helps enterprises use Large Language Models (LLMs) to find answers privately and securely.

Dave Cliffe

Head of RAG (Rendering AI Guidance) at Atolio

Get the answers you need from your enterprise. Safely.

Subscribe to receive the latest blog posts to your inbox every week.

Book a Demo

Get the answers you need from your enterprise. Safely.