Uber owns no fleet, and Airbnb owns no real estate. These and other companies have grown very quickly, with relatively little investment, by establishing thin customer-interface layers above complex systems of goods or services owned by third parties. Point is to keep think semantic layer for faster data access and reduce complexity of Infrastructure harnessing next gen connected data. This is made possible through data minimization, a key concept in the digital transformation world.
By minimizing data, through reducing data redundancy, especially at the data-access layer, businesses are able to rapidly gain value. It enables them not only to seamlessly migrate from legacy OLTP and OLAP platforms, but also to do so incrementally, while organizations evolve into their desired, optimal architecture, while all the while still enabling access at the thin data-access layer, and also accelerating that access.
Steven Hall once famously said- “Every single cell in the human body replaces itself over a period of seven years. That means there’s not even the smallest part of you now that was part of you seven years ago.”
This is equally true with Digital transformation/technology roadmap, the scale in which technology is rapidly changing and evolving is gobsmacking. Digital efficiency needs technology Layer to enable de-coupling presentation/consumption Layer from underlying technology complexity and data changes.
The Data Virtualization Factor
This type of data minimization, in turn, is made possible through data virtualization, working in tandem with cloud-based data-sharing technologies such as Snowflake, AWS Redshift, and various data-as-a-service (DaaS) offerings. Such a solution would not replace an existing data warehouse, data lake, data mesh, or Databricks Delta Lake; rather, such a solution would be an extension to these implementations, augmenting their capabilities for self-serve, Logical Data Fabric.
Data virtualization enables real-time data access without replication. As the name implies, data virtualization is a virtual approach to data management, which logically enables data consumers or applications to retrieve and manipulate data without needing to know the technical details about where and how the data is formatted and stored. In fact, it provides consumers or applications with a single view over the entire enterprise data infrastructure.
Data virtualization establishes a logical data layer that seamlessly integrates all the data stored across the various sources, centrally managing it for security and data governance, and delivering it to business users in a format that they can easily consume. While performing all of these functions, the data virtualization layer pushes down the query workloads to the underlying systems, but it can also cache them for re-use.
Typical Data Virtualization Features:
- Support for a wide variety of systems, such as RDBMS, MPP, NoSQL, Excel files, Web services, SaaS APIs, and queues.
- Advanced query optimizers with support for push-down aggregation and query rewriting, complex, big data volumes, and massively parallel processing (MPP)
- Kerberos, row-and-column-level security, masking, role-based access
- Full and partial mode caching
- Support for JDBC, ODBC and REST web service interfaces
- Data catalogs that offer google-like search capabilities
- The ability to run stand-alone
- Multiple deployment patterns
12 Tips and Tricks for Implementing Data Virtualization
In my own experience with data virtualization, I’ve learned many ways to gain the maximum benefit from this important technology. Here are 12 of the most important tips and tricks to keep in mind when performing your own data virtualization implementation:
- Establish a single unified data-access point for all data: structured, unstructured, data that is spread across the file system, and data from SQL/No-SQL databases. Build a bridge between big data sources and relational database sources.
- Unify data security -Enable federated/heterogeneous joins of data residing in disparate locations/sources
- Encourage the agility of the development team by focusing on incremental value with quick access to data.
- Decouple applications and analytics from physical data structures, enabling data infrastructure changes while minimizing impact to users.
- Accelerate data Delivery without the need of duplicate data physically and leverage data discovery
- Enable and accelerate system integration testing and user acceptance testing for migration projects
- Compare and reconcile data through the common semantic layer for hybrid cloud configurations
- Combine data-lake data with operational data to enable pseudo-real-time data feeds.
- Reduce data replication to enable data minimization and retain control over the entire data lifecycle.
- Re-deploy personnel from performing operational controls over data integration and monitoring extract, transform, and load (ETL) processes improving digital efficiency.
- Increase Flexibility and self service allowing multiple consumption patterns once data has been provisioned
- Awareness and discovery of data through Data Catalog.
Getting Started
Before you begin, seek to understand all of the possible considerations relating to data sources, consumption use case, Non-functional requirement, data consumption, necessary transformations, and implementation. Plan out how the data virtualization solution will fit in with the existing environment, and how it will add value in the Nirvana environment. After taking these steps, stakeholders will be surprised at just how rapidly their organizations will begin to gain business value from data virtualization.