Some still see data virtualization as an alternative technology to ETL. For them it’s nothing more than another style of data integration. However, data virtualization is much more. Data virtualization is about abstraction. It’s about data consumers being decoupled from the data stores and data producers. It’s about being able to develop applications that are not tied directly to a specific database technology. And it’s this capability that makes it valuable for all kinds of use cases, and one of them is database migration.
In recent years, many powerful, scalable, and fast new database platforms have been released, such as the Hadoop patform; some of the NoSQL products; the analytical SQL products, such as SnowflakeDB, Google BigQuery, Amazon RedShift, and memSQL; the GPU-based databases, such as Kinetica and MapD; and the list goes on. To exploit this new technology, a migration is required. The need for migration may come from the desire to switch to a cheaper platform, or to a faster and more scalable platform, or to move data to the cloud and run a platform designed and optimized for the cloud.
But we all know that database migration can be a tedious and time-consuming process. The problem is not so much in lifting and shifting the data to the new platform, but a new platform probably speaks a different SQL dialect or maybe even a completely different language. This means that existing code for accessing data must be rewritten. This can be such a time-consuming exercise that organizations don’t even want to consider a migration. However, this does not apply when the abstraction strength of a data virtualization server is used.
To explain this in detail, the concept of query pushdown must be explained first. Every query consists of a number of operations, such as a filter, join, aggregation, calculation, or statistical function. A data virtualization server always tries to push as many of the query to the underlying database platform. In other words, it tries to let the database platform do most of the query processing. Operations that can’t be executed by the database platform are executed by the data virtualization server itself. This query pushdown mechanism guarantees that the query processing is done as close as possible to where the data is stored. To be able to do this, a data virtualization server knows for each database platform precisely which query operations can and which can’t be pushed down. For example, more query operations can be pushed to the Oracle database server than to a flat file system with no query processing capabilities.
Which operations are pushed down is determined live by the data virtualization server. So, if we move a table from one platform to another, the data virtualization server adjusts itself automatically and will push another set of query operations down to the new database platform. This is all part of the abstraction concept. Users are not aware of this or don’t have to concern themselves over this.
This query pushdown feature helps to ease a migration process. Data stored in a specific platform accessed by reports through a data virtualization server can be migrated easily to another platform. After the data has been copied to the new platform, the virtual tables in the data virtualization server must be redirected to the new platform, and the reports will continue to work. The data virtualization server determines which query operations can be executed by the new database platform. Report developers don’t have to concern themselves with that aspect.
In fact, this also allows for a low-risk form of database migration. Technically it’s possible to migrate table for table instead of the entire database in one go. Even if it’s done table by table, the data virtualization server knows where each table is. Evidently, this may have some influence on performance, because in this case some queries are (temporarily) executed as distributed queries.
This is really the abstraction capability at work. It allows organizations to effortlessly adopt and exploit all those new database technologies that have been introduced lately. Data virtualization can really make database migration easy.
- Metadata, the Neglected Stepchild of IT - December 8, 2022
- Zero-Copy Architectures Are the Future - October 27, 2022
- The Data Lakehouse: Blending Data Warehouses and Data Lakes - April 21, 2022