The current data landscape is fragmented, not just in location but also in terms of shape and processing paradigms: data lakes, IoT architectures, noSQL and graph data stores, SaaS vendors, etc. are found coexisting with relational databases to fuel the needs of modern analytics, ML and AI.
In the past, traditional data warehouses were the tool of reference for consolidating data in a single location for analytics. But although they still play a pivotal role in a company strategy, the data available in other systems can’t be ignored. Physical re-consolidation of that data back to a single location, although possible, becomes less appealing by the day.
- Data volumes are too high. You don’t want to replicate your entire EDW your data lake, and vice versa, for instance
- The use of built-for-purpose systems (e.g. Spark for ML) will lose its role if their data ends up somewhere else. Data should live in the system that is better suited to process it
- Stricter regulations like GDPR favor better govern architectures, and “data swamps” don’t help
End users who continue to use traditional methods in the modern data landscape pay the price in the form of extended time to market (or more accurately, “time to data”).
Does a logical approach make more sense?
Quite simply, yes it does. A logical approach involves a virtual layer that connects different systems and exposes them as one, while simultaneously hiding the underlying complexity of the back-end systems from the business user. This centralized approach means security, governance and auditing are all under control.
Data virtualization software provides a metadata catalog and an execution engine, based on the ideas of the original relational database, the main difference being that a virtual layer is focused on data delivery, not on storage. Replication of data is still an option, and in some cases the preferred one. But a logical architecture adds the flexibility to have replication as an option, not a necessity.
What’s the value in it for me?
The benefits of the logical data architecture go beyond the possibilities of data warehousing and reporting, and can also be applied to other scenarios such as logical data lakes for data scientists and business users.
- With one single location to get all your data, it’s much easier and quicker to generate the analytics you need to make data-driven decisions
- The need to replicate data is significantly reduced meaning operational costs are also lower
- Rather than physically consolidating the data, it is logically consolidated, traced to the source and secured, massively boosting data governance and security efforts
In this data-driven world, delays and inflexibility caused by outdated data management systems is no longer acceptable, especially with the growing need to incorporate new information. This is why a logical data warehouse is the only logical choice.
Learn more about logical data warehouses
Below is a list of resources I believe help explore this topic in more detail:
- [Blog] Why the Enterprise Data Warehouse Will No Longer Suffice in Our Data-Driven World
- [Webinar] Performance in a Logical Data Warehouse
- [Whitepaper] Developing a Bimodal Logical Data Warehouse Architecture Using Data Virtualization by Rick van der Lans
- Improving the Accuracy of LLM-Based Text-to-SQL Generation with a Semantic Layer in the Denodo Platform - May 23, 2024
- Denodo Joins Forces with Presto - June 22, 2023
- Build a cost-efficient data lake strategy with The Denodo Platform - November 25, 2021