Part 2: Can a Data Fabric and RAG Clean up LLMs? On-Demand Enterprise Data Querying

Querying Minds Want to Know- Can a Data Fabric and RAG Clean up LLMs - Part 2- On-Demand Enterprise Data Querying

Reading Time: 5 minutes

Generative AI has opened up new possibilities for organizations that want to be creative and fearless with their data. In my first post in this series, I introduced ways that data fabric and retrieval augmented generation (RAG) can support large language models (LLMs). In this second part, I want to share how data fabric and LLMs can enable on-demand enterprise data querying in the form of sophisticated applications, chatbots, and virtual assistants. This capability can provide instant access to relevant information, with all the associated benefits.

What is On-Demand Enterprise Data Querying?

On-demand enterprise data querying, also known as text-to-SQL- generation, is a very popular use of generative AI that empowers users to ask questions of their data using natural language. This enables users to perform a myriad of analyses without having to build reports, dashboards, or datasets. It also enables anyone to participate because it requires no technical expertise. In ChatGPT and Data Fabric are Streamlining the Field of Business Data, my colleague Anastasio Molano talks about how natural this capability can be, and he provides a peek into how we have incorporated it into the Denodo Platform. In this post, I’ll provide some ideas about how to leverage the Denodo Platform’s data fabric to bring this text-to-SQL-generation feature into your own applications and products.

Requirements for Effective Text-to-SQL-Generation

To support text-to-SQL-generation effectively, organizations need access to their corporate data as well as a comprehensive understanding of their database schema. The latter helps LLMs to generate accurate SQL queries. Business users typically leverage many databases, tables, etc. for effective decision making. This, however, can be a double-edged sword for LLMs, since the more information they have, the more confused and prone to errors they can become. To achieve the best results from an LLM, you need to understand the options that are available for increasing the odds of generating a good query. This is typically managed through prompt engineering on its own or in combination with RAG. “Prompt engineering” is the discovery of what information and format works best to get desired results. But while prompt engineering offers a straightforward method, it can face issues like schema complexity and unclear user intent. Although RAG can address these issues by dynamically improving the accuracy and relevance of generated queries, the best results will come from how effectively you can simplify the information you need to work with the LLM.

The Data Fabric Factor

Most organizations manage data that comes from many sources and of many types—which can be a challenge. A multiplicity of data sources introduces additional complexity as organizations have to work out how to query each of them independently. A data fabric significantly enhances text-to-SQL-generation by providing a consistent SQL interface that enables real-time data access. It simplifies the information required by LLMs to generate queries, reducing potential confusion and errors. Without a logical data fabric, managing and processing data for SQL generation would be considerably more challenging. The Denodo Platform enhances data fabric capability by enabling organizations to carve out a portion of the fabric for specific applications, providing dynamic schema access and deploying REST services, effectively tackling common challenges such as data security, access restrictions, and ensuring the availability of the most current data. This means that it can streamline the achievement of many of the necessary data management goals for working with LLMs, including:

Restricting access for specific applications
Addressing domain specific naming conventions
Securing information based on user access
Providing access to all necessary enterprise data
Providing access to the most current information
Ensuring good performance
Maintainability and extensibility

Let’s illustrate how data fabric can overcome common data management challenges that organizations typically face.

An Example Scenario

Let’s imagine that we are working on a proof-of-concept (POC) with a potential customer. The customer’s account representatives shared with us that they are having some challenges answering customer questions regarding their orders. They find that they must access several systems to find answers and it takes a lot of time. Account representatives have specific customer regions that they service and therefore they should only have access to that information.

Account representatives should only use information from sources approved for this activity. The customer would like to start by exposing customer orders and deliveries but would like the ability to add more sources over time without having to add any code or changes to the project.

We would like to determine if it is feasible to employ GenAI to help them out. If it works out well, we may consider enabling customers to ask their questions directly.

Sounds like a tall order! But let’s see how a data fabric can help.

Implementing a Controlled Data Schema

A data fabric enables us to carve out a piece of the fabric to build specific schemas for handling customer orders and deliveries, connected through well-defined associations. This schema simplifies the task for the AI in generating SQL queries and ensures that data access is governed by strict security policies. This setup is designed for scalability and adaptability, allowing for the integration of new data sources without system overhauls.

Security and Compliance

To ensure data security and compliance, we can leverage the Denodo Platform’s capabilities to enforce access restrictions based on user roles and regions. This approach enables us to tailor data access to the needs of individual representatives, ensuring they receive only the data pertinent to their specific customers.

Performance and Accessibility

Utilizing a data fabric enables us to connect to both historical and real-time data sources, so representatives have access to the most current data, and they have the flexibility to leverage data from multiple sources via a single point of entry, if needed. Keep in mind that this information would be modeled in one place and the data can originate from any source in the enterprise that is connected to the data fabric. Leveraging information via a logical data fabric also offers inherent performance as query-optimization features ensure that your queries, especially the generated ones, execute as efficiently as possible. This capability not only improves query performance and reliability but also significantly enhances the user experience and operational efficiency.

Now You Are Ready for the LLMs

Thanks to our data fabric, we have dynamic access to the schema as well as access to the underlying corporate stores. We can also leverage the LLMs to generate queries based on account representative questions and send the query into the data fabric with the privileges of the account representative. Since we have restricted the information for the schemas and modeled them specifically for this task, we now have the potential for great success leveraging prompt engineering alone; you can also use the information from the data fabric if you need to use a RAG approach.

The Value in Leveraging a Logical Data Fabric

In our example, you can see that you can control the changes to these components despite the volatility of your enterprise data landscape. There are quite a few advantages to this approach for our use case and as a pattern for future applications:

The ability to leverage a consistent AI Agent framework
Distributed management of data and database schemas
Distributed security and compliance
The ability to extend new applications via new virtual databases
Consistent approach and processes
Simplification of the data model
Separation of concerns and decoupling
Maintainability and extensibility

In practice, data fabric enables the decoupling of AI agents from the data sources, which means that specific data management tasks can be handled independently from the AI applications. This separation of concerns not only simplifies the overall data architecture but also enhances maintainability and extensibility. Additionally, the fabric’s ability to dynamically adjust to changes in the data environment helps maintain high performance levels and up-to-date data access, crucial for maintaining competitive advantage.

Summing it all up

Text-to-SQL-generation is a valuable capability that can empower users to ask questions and get answers in real time. It enables organizations to reduce the amount of work needed to model and transform data. If organizations leverage a logical data fabric enabled by the Denodo Platform, they can jumpstart their generative AI journey. A logical data fabric enables organizations to work fast and easily distribute responsibilities, while remaining insulated from changes in the data landscape. They also benefit from fast query execution and model simplification over the entire enterprise data landscape. A logical data fabric can help manage the complexity of a data landscape just as Felix Liao explains in his blog post entitled “Unlocking the Power of Generative AI: Integrating Large Language Models and Organizational Knowledge.”

I hope that this example has given you some inspiration and food for thought as to how you can leverage a data fabric to get ahead using generative AI to turn natural language into information insights. If you want to know more about this topic, please let us know.

In my next post, I’ll cover ways to leverage data fabric with semantic search approaches to GenAI.

Author
Recent Posts

Terry Dorsey

Sr. Data Architect/Evangelist North America at Denodo

Querying Minds Want to Know: Can a Data Fabric and RAG Clean up LLMs? – Part 2: On-Demand Enterprise Data Querying