One of the ways that you can tell that a concept or technology is gaining currency is that many people start interpreting or associating with it in their own way, sometimes in very, very different ways. Data virtualization is a case in point.
What is data virtualization, really? (Pardon the pun). Let’s start with virtualization. Wikipedia says, “In computing, virtualization refers to the act of creating a virtual (rather than actual) version of something, including (but not limited to) …<list> … resources.”
It’s when you misrepresent the resource you are virtualizing, that we start confusing the market. So, let’s look at some variants.
Data virtualization, according to Wikipedia, “… is the presentation of data as an abstract layer, independent of underlying database systems, structures and storage.” That is a pretty good start, but it needs additional capabilities to complete the description:
- Create data abstraction and data access layer independent of underlying data sources.
- Semantic integration of disparate structured and unstructured data to create canonical views of key business data entities.
- Real or near-real time data access using federation, caching, and/or selective batch movement (but without wholesale replication of the data as in previous generation data integration technologies).
- Deliver data services in various formats to various consumers, with differentiated security, service levels and monitoring, and allowing multiple data interaction paradigms, such as search, browse, query, etc.
- Allow discovery and governance of data and metadata, lineage, change impact and other data management capabilities through the virtual data layer.
This is the Real Mr. Data Virtualization – what Denodo does best and has been entirely focused on for longer than anyone else.
But, to further distinguish data virtualization, let’s also consider a few other concepts and how they are often misunderstood.
Database virtualization allows one database server to be seen as many or vice versa. This enables sharing of single server resources for multi-tenancy, as well as the pooling of server resources into a single logical database, or cluster (e.g. Delphix, ScaleDB, Oracle since v.12c). Actifio does something similar to decouple data from infrastructure but calls it by yet another name – “copy data virtualization.” As Forrester Research’s Noel Yuhanna puts it, “You virtualize and federate heterogeneous data into a common semantic or meta layer, which Forrester calls the information fabric. Below this data virtualization layer is the database virtualization layer.”
However, database virtualization is not the same as running a DBMS inside a virtual machine… this is simply server virtualization as we’ll see next.
Virtualization, on its own, often refers to hardware resources, such as server, memory, storage or network virtualization. VMWare and Microsoft Virtualization (Hyper-V) are prime examples. The concept of hardware resource virtualization is closely tied to cloud computing as it enables massive scalability and sharing while reducing costs and simplifying administration.
Finally, we come to data visualization. Many get confused because it sounds similar, yet it is completely different because it refers to the analysis and presentation layer. Microsoft PowerBI, Tableau, QlikView, etc. are very useful to tell a good data story visually, but while they do some data gathering and blending, they are complemented by easy access to good, integrated data provided by data virtualization.
In summary, all of these technologies are valid and useful, but it is important to remember that while their capabilities may overlap a little, they are truly complementary and one does not substitute the other. From an ROI and business benefit perspective, it could be argued that data virtualization delivers more than the others. While database and server virtualization primarily deliver IT cost savings and flexibility, data virtualization also impacts business agility, time to market, customer satisfaction, etc., increasing potential value. Additionally, while data visualization helps primarily the users of such tools, data virtualization provides agile access to more meaningful information that can be leveraged across the enterprise by both, end-users and applications. Thus, data virtualization has the potential to transform the business top (revenue) to bottom (cost and scale).
The Real Mr. Data Virtualization is ready to help you tackle many data challenges. Download Denodo Express and you’ll find a willing companion ready to roll.
- Data Governance in a Data Mesh or Data Fabric Architecture - December 21, 2023
- Moving to the Cloud, or a Hybrid-Cloud Scenario: How can the Denodo Platform Help? - November 23, 2023
- Logical Data Management and Data Mesh - July 20, 2023
Nice blog on dv. You have clarified on the distinct difference between cloud computing virtualization and dv. The ROI benefits have been laid out well, Nice job!