Data Virtualization platforms conform entire development and operation ecosystems in the organizations they are deployed for. They are not typical pieces of enterprise software, but key points of interconnection and cooperation among multiple systems, tools and teams. Data Virtualization depends on many other parts of the organization, and many other parts of the organization depend on Data Virtualization.
This is why nowadays, in a world where most teams have already understood the value that automated testing brings to their development and maintenance processes, bringing such automated testing to the area of Data Virtualization becomes an industry mandate. We need automated testing for providing our Data Virtualization systems with the same degree of robustness, quality assurance and peace of mind that testing has been providing to so many other software scenarios for years.
(Note that, even if manual testing may still play a role in certain specific scenarios in which it might still be the only available option, massively automated testing is the only way to achieve high standards of quality in the development and maintenance of any software systems, which is why this article will specifically focus on this type of testing).
In order to examine the impact that automated testing (just “testing” from now on) can have on Data Virtualization, let’s first have a look at the “when”, and then we will examine the “how”.
When to test
There are a number of important scenarios in which testing Data Virtualization can be useful:
- During development, as a way to express the requirements of a specific part of the Data Virtualization system before it has been developed, in a way so that the system will be considered “according to specification” once all tests are passed. We could call this “Test-Driven Data Virtualization Development”.
- During maintenance of Data Virtualization systems, so that we are assured that no changes performed on the system break any important functionalities. A good repository of tests covering the relevant use cases of our environment will allow maintainers to quickly determine whether new changes bring any unpleasant surprises for the systems they affect or any others, much before the changes are rolled into production.
- During software replacement or update processes, in order to easily check that newer versions of data sources, software dependencies or even the Data Virtualization platform itself don’t negatively affect running environments, giving the Operations people the required peace of mind to go on with these update processes confident that nothing should go wrong.
How to test
Testing tools must offer a way to easily specify large numbers of tests which can be executed, verified and reported in a fully automated manner. Specifically, a good testing tool for Data Virtualization should offer:
- Automated execution of large sets of tests, and easy integration into existing continuous integration and quality assurance processes.
- Support for testing through a number of interfaces and/or APIs offered by Data Virtualization systems, which means being able to extract data not only by executing SQL queries but also from calling REST web services or examining RSS feeds. Also, the testing system should be able to match such extracted data in a variety of ways both from pre-established result sets, or from directly querying data sources, etc.
- Advanced result reporting capabilities that allow users to both visually understand the results of their manually triggered executions, and also receive notifications (e.g. email) when a scheduled execution finds any test errors.
- Flexible and simple format for test definition, with the fewer possible dependencies on external tools. Text-based formats are also advisable for tests, so that test sets are easily manageable with the same Version Control Systems as used for virtual schema metadata and application code.
- Ideally, support for the writing of tests by users with no or limited coding abilities. Many times, final users themselves will be interested in defining their own tests, so removing the need to use any programming languages is advisable. Writing a couple of SELECT SQL statements and writing a CSV for expected results should be enough as for required technical abilities.
With all this in place, Data Virtualization systems should be able to offer a much more robust environment for both developers and operators, increasing the confidence of the whole ecosystem on their dependencies on this central part of an enterprise architecture.
- Zero-Code GraphQL API Creation with the Denodo Platform - May 13, 2020
- Using Data Modeling Tools for Top-Down Data Virtualization Design - January 3, 2018
- Supporting Geospatial Data in Data Virtualization Environments - September 27, 2017