Vector database representation
More posts
Before you commit to generative AI, here are a few questions to decide if your business is ready.
Generative AI has increased the interest in enterprise search. We assess the different technologies.
Software often forces teams into unwanted ways of communicating, collaborating, and organizing tasks. But it doesn’t have to.
Share this article:

How to navigate the risks of enterprise AI deployment

Generative AI and large language models (LLMs) are going to transform the way that businesses operate forever. This is a paradigm shift that will permanently alter the way we access, manage, and share knowledge. 

Anyone can now request information from anywhere across the business and have a precise response almost instantly. This avoids endless hours spent searching through tools and apps. But it also opens up many new possibilities to drive revenue growth, with better decision making, greater output, and an optimized customer experience, to name just a few. 

However, there are different approaches to using LLMs and generative AI and some come with significant risks attached. Here we outline the four major risks to consider and how best to mitigate them.

1. Erosion of trust 

One of the structural features of LLMs is a phenomenon known as "hallucination". This is where the model produces outputs that, while appearing plausible, are factually incorrect. It occurs because LLMs generate responses based on similar patterns observed in their training data. This means that even if the LLM cannot resolve a particular query, it will just make something up that sounds similar instead.

Given the known risk of hallucination, it’s inevitable that trust in the system gradually erodes. And if users find themselves constantly second-guessing the outputs, the efficiency gains that the technology promises could be largely wiped out. 

The Solution: To mitigate this, it’s critical that responses that are rooted in real-time business data, not the LLM layer. Retrieval Augmented Generation (RAG) is often a good way to do this and uses a two part method to retrieve information. In the first step, an external database or source of information is accessed and the relevant information is found. Then, using that information, it crafts a coherent and precise answer to the question.

2. Inaccurate results 

The vast majority of AI powered knowledge retrieval tools use vector databases to create an index of all enterprise data. This approach is often used because it’s seen to be more efficient and it allows for semantic search, meaning exact keyword matches are not required. Data-indexes like this work by storing information in vector form in high-dimensional space and use 'nearest-neighbor' algorithms to pinpoint relevant data. 

However, this approach quickly runs into problems for larger data-sets. As data dimensions multiply, the space undergoes rapid expansion. This results in many data points appearing much closer together in relative terms, which makes it increasingly difficult to discern genuine close matches. The result is a significant deterioration in both accuracy and precision in enterprise-scale indexes, with an increasing prevalence of false positives. This means that the answer can be generated using internal business data (unlike the hallucination problem), but might still be wrong. For example, information might be sourced from a similar looking, but ultimately incorrect document. If this isn’t recognised, this mistake could trigger other regulatory issues if inaccurate information is being stored and shared.  

The Solution: Look for alternative knowledge retrieval methods that don’t require data-indexing. While it works well on a smaller scale, the precision is poor in very large data-sets. Federated search via API can be a good alternative, when combined with other methods, such as RAG.

3. Weak data security 

Precision is not the only problem with data-indexing. Data security is also a major risk, as it requires the duplication of all enterprise data to create the index on the provider’s server, with almost constant data transfer to keep it up to date. Naturally, this significantly increases the data leakage risk, and significantly increases the surface area for potential cyber attacks

The Solution: Again, the solution here is to avoid solutions that require data-indexing in order to minimize additional data transfer and duplication. It’s inevitable that some data has to pass through the system you’re using, but it should be transient wherever possible, meaning that it’s discarded immediately after use.

4. Sensitive information

Enterprise companies have vast repositories of data in all sorts of different tools. Connecting this data to a powerful AI model means it can be made easily accessible to everyone and used to generate valuable strategic insights. However, this also poses risks, as employees could use the AI to reveal sensitive information, intentionally or not. For example, it could be used to generate a comparative performance assessment across a workforce, or to analyze the sentiment of communication between certain individuals, potentially revealing bias. Clearly, information like this could have have serious consequences for company culture and safety.

The Solution: Guardrails need to be in place which block certain query types. These can be set by the provider, which will likely have some defaults. Further controls can then be determined by the customer admin, who will have a better understanding of the company’s data and any specific areas of risk. Token-level access is also important, ensuring that each user only be shown information they already have access to. This should be standard for any system using AI powered knowledge retrieval.

Qatalog is purpose built for safe and secure AI deployment. There’s no data-indexing required and it’s highly hallucination resistant. Book a demo to learn more

sam-ferris
WRITTEN BY
Sam
Ferris
Contributor
Sam was previously the Qatalog Comms Lead and is now a Communications Consultant, working on behalf of other prominent technology companies.
See their articles
Latest articles
Rapid setup,
easy deployment
Seamless onboarding • Enterprise grade security • Concierge support
Get a demo