Search is more than just Google?
In his recent post, “Enterprise Search is Better than You Imagine”, David talked about some of the changes which have unlocked a new set of business opportunities in this domain. I want to talk a little about the exciting technical opportunities that are also a part of this resurgence. We’ll touch on what makes enterprise search different, the emerging tech driving the new wave, and some of the fun challenges we’re working on at Atolio.
Most folks are aware of consumer web search. You probably used AltaVista or AskJeeves, moved to Google or Bing, and you might even be trying out Neeva, Kagi, or any of the newer web search players. Fewer folks know that there’s a whole world of commercial search too! Search is woven throughout e-commerce shops, the legal world of e-discovery, and of course various forms of enterprise search for your corporate documents and communications.
Why is enterprise search more interesting?
For the search engineers reading, let me offer that enterprise search is a far more interesting place to be working when it comes to commercial search right now. While e-discovery faces some really interesting problems of scale, the bulk of search relevance there is focused on maximizing recall and competing in a low-margin segment where the lawyer’s voice always trumps the algorithms. E-commerce is seeing some interesting innovations with multi-modal image and text search. However, in the end, it boils down to the single use case of optimizing for conversion to sales.
In enterprise search, we face an exciting multitude of user scenarios and information seeking challenges. At the same time we must deliver to rising customer expectations and pressures based on consumer web search experience. Finally, emerging innovations in machine learning and NLP are really shaking up the field. There’s a ton to learn and implement across search, exploration, recommendations, question answering and more.
What’s the New Wave?
Enterprise search is facing a sea change. Most content created inside companies now has multitudes of metadata alongside the core text. Advancements in Natural Language Processing (NLP) are allowing for new and better ways of grappling with all the text created in the enterprise. Together, these changes are unlocking new solutions for real world challenges faced by the modern knowledge worker in a sea of data.
Metadata is kickstarting the New Wave
Almost all corporate content and communications have moved into sophisticated systems which store not only the core text but tons of metadata around the text. We know the who, what, and when of almost every document, slide deck, email, and chat message created by corporate staff.
This metadata unlocks the ability to weave a layer of permissions over enterprise search, which addresses a long standing gap of allowing employees to search and view only the things to which they are granted access. This metadata also unlocks powerful query and filter abilities, which can be combined with full-text search via modern style search platforms such as Vespa.ai.
Deep Learning is adding momentum to the New Wave
Second, the academic side of machine learning is unlocking whole new applications, as they make continued advances in deep learning around natural language processing (NLP).
If you’ll forgive the simplification, the introduction of the Transformer architecture coupled with Google’s release of pre-trained BERT style language models, opened the door to representing sequences of text as dense embeddings. These semantically rich embeddings are at the core of a renewal in the industry side of search. We’re seeing significant innovation in semantic search as well as in combining traditional lexical search and semantic search as hybrid search.
Additionally, the language models have been steadily progressing. They are being used for an array of tasks including classification, named entity recognition, and summarization (BERT, BART, etc). More recent large language models (LLMs) are getting better at those tasks and unlocking further capabilities (T5, GPT-3, etc). Finally, we’re seeing combinations of search and language models into single systems for question answering (RAG et. al).
Where’s the wave going?
Combining an understanding of these changes, into a well formed implementation, is leading to improvements in a traditionally stale search market. The metadata is supporting applications of permission to deliver enterprise grade security as well as new understandings of how workers are collaborating. The NLP advancements are leading to real improvements in search relevance and quality while introducing new functionality in question answering and insights.
As we look ahead, continued research into embedding creation and usage is helping to unlock the long-standing challenge of adapting search to each unique domain presented by the lexicon and topics inside a given business. The frontier of fine-tuned models and domain adaptation are particularly intriguing.
It’s an exciting time for the industry, and it’s literally changing week by week. We’re having a ton of fun working on these challenges and opportunities at Atolio!
Cody Collier leads search and machine learning at Atolio - workplace search for the modern company.