At the heart of every organization lies a data architecture, determining how data is accessed, organized, and used. For this reason, organizations must periodically revisit their data architectures, to ensure that they are aligned with current business goals. Here, as a guide to understanding the differences between some of today’s key data architectures, I’ll explain the concepts of data warehousing, data lakes, and data mesh architectures, and how they can all be taken to the next level by a technology called data virtualization.
Data Warehousing
Data warehousing was first developed in the 1990s to centralize structured data for business intelligence (BI). Data was (and still is) delivered from source systems to a central data warehouse via extract, transform, and load (ETL) processes. Due to the highly scripted procedures that they follow, data warehouses tend to promote consistency and data quality. Also, because they centralize data, they facilitate historical research.
Data Lakes
Data warehousing was the dominant paradigm until the beginning of the new millennium, when big data entered the picture. The data world was no longer structured. Organizations needed to work with new unstructured types such as email and even pictures. In addition, organizations needed to capture this new unstructured data as quickly as possible, in response to new real-time use cases.
To accommodate big data, organizations turned from data warehouse to data lake, which is capable of storing any data in any format. It became very simple and affordable to store data, but accessing the data was another matter. To be used, the data often had to be transformed.
Data lakehouses were developed in the 2020s. As the name implies, data lakehouses seek to combine the flexibility and affordability of the data lake with the structure of the data warehouse.
The Central Question
Data warehousing, data lakes, and data lakehouses are all examples of centralized architectures. However, in the last few years, analysts have been finding that all centralized architectures are at odds with the needs of many organizations today, which are composed of different domains, each with different data needs. Shouldn’t data ownership and management be de-centralized, distributed across the organization?
This is the question that led to the development of data mesh. In a data mesh, data is managed by different “data domains” within the organization, and they deliver different kinds of data, to different users or groups within the organization, as “data products,” designed to meet particular needs.
Data Mesh and Data Virtualization
In contrast with ETL processes, data virtualization enables data to be integrated in real time, without physically moving it, or copying it to a centralized repository. In this way, it is a key technology for enabling a data mesh configuration.
In addition, data virtualization enables organizations to build any number of semantic models in a layer above the data sources, without affecting the underlying source data. This enables data domains, in a data mesh configuration, to manage the data and deliver it as data products, again without affecting the underlying data. Data domains can therefore work iteratively and flexibly with the available data.
The Modern Data Warehouse, and Beyond
Data virtualization also works with existing data warehouses, turning them into logical data warehouses. Logical data warehouses are capable of capturing and processing unstructured data along with the usual structured data. A logical data warehouse can do everything that a traditional data warehouse can do, but it can also accommodate most modern data types, if not all, and it provides real-time access, regardless of the location and type of the source data.
For this reason, data virtualization can bring real-time data access – and support for modern use cases – to any existing data architecture, whether it is based on a traditional data warehouse or a data lake, or whether it is based on a modern data mesh configuration.
Here on the Businnes & Decision blog, you’ll learn much more about the latest trends in Data Management and Digital Transformation. Keep following along and keep an eye out for all our articles!