Complex Questions that Need Simple … but not Simplistic … Answers
Update: make sure you also read the first part of this content. Data Virtualization Performance and Source System Impact (Part 1)
In part 1 of this article, we looked at the question that is most often asked about data virtualization: “What about DV Performance?” This is an important question that differentiates a true data virtualization platform from the rest.
Our focus in this article is Source System Impact and Resource Management. We will now explore in more detail the advanced features found in Denodo 6.0 in this respect:
Data View Parameters – Limit the type of Queries Sent to the Data Source
Denodo views can be configured (through what we call ‘view parameters’) to allow only queries following a certain template, which you know will perform well in the source. For instance, suppose the data source exposes a view with customer information. You can decide to allow only the queries that specify a customer_id. This means that all the queries on the view will retrieve data of a single customer, so the data source will never be forced to handle large result sets.
This strategy can be very effective because even slow data sources are typically designed and optimized for some types of operations/queries by adding indexes for the relevant fields to improve performance and, as long as you do not deviate from them, they will be able to support a fair level of concurrency. The real problem are unexpected ad-hoc queries the system is not optimized to deal with, and this avoids that risk.
Source-Aware Query Optimizer – Design Queries to Leverage Source Capabilities Best
One of the key underlying capabilities of Denodo connectors is to very well understand the query capabilities of the sources. When this is combined with statistics on query execution, the optimizer is able to determine what types of queries can and cannot be handled well by the source and dynamically push down some queries but not others.
A few DBMS’s now support both row-store (transactional) as well as column-store (analytical) options. Other DBMS support in-memory architecture to minimize disk access, or parallel or distributed processing systems. This is something Denodo can exploit in addition to what the underlying systems can do to route queries to the appropriate structure. Such is the case with data platforms such as Oracle Exadata or SAP Hana or data warehouse appliances, which can handle both operational and analytical queries simultaneously with acceptable performance. On the other hand, Denodo can be configured to avoid pushing down costly operations (e.g. group by operations) to less capable data sources.
Caching Strategies — Cache Frequent and/or Costly Views and Queries
Denodo offers many options and granular strategies for triggering and storing cached views of data. The cache itself can be on disk, memory, or elastic storage networks. For example you can start with a batch cache load during off-peak hours for source systems and incrementally cache during the day. You can use partial caching to reuse the results of recent queries and/or to cache frequently used data. You can also cache the results of queries that you know are costly. Finally, you can enforce full caching of certain views to avoid that all queries on those views go to the data source.
Denodo Scheduler – Selective ETL / Batch Operations
Rather than approach the problem with an “ETL-first” mindset, Denodo allows customers to adopt data virtualization knowing that any view, at any time can be scheduled to operate in batch mode. The Denodo platform includes a built-in ETL / batch tool called Scheduler that allows hybrid execution combining real-time, cache and batch execution strategies.
Resource Throttling – Managing Resources and Service Levels
Denodo can establish resource throttling at every layer in the architecture:
- Data source throttling: limiting the number of concurrent requests sent to a data source, avoiding data source performance problems at peak times. This aspect of resource throttling directly manages source impact, while the other two below manage the overall DV environment service levels.
- Server throttling: limiting the number of sessions each Denodo server should accept concurrently. When that limit is reached, new requests are queued and executed according to arrival order. Server throttling is useful in high load environments, since it avoids performance degradation issues at peak load time. Memory can also be managed and allocated so that no single query can disrupt the system.
- Client throttling: the client application can maintain a pool of connections to Denodo Platform, limiting the number of connections that can be used simultaneously to the Denodo server.
Resource Manager in Denodo 6.0 – Dynamic and Custom Policy Decisions
This new capability further extends and simplifies resource management capabilities by introducing dynamic workload management decisions as a function of various parameters such as the views involved, type of query, user or role executing the query, time of day, monitoring status, and many more. Fully customizable policies then take action to ensure resource and service level management in a variety of data virtualization scenarios. For instance, you could define a policy that sets a maximum number of concurrent queries for a certain application and the rest would be queued. You could also specify that the queries from users with the ‘executive’ role should have higher priority than queries from other users. Another example is prioritizing some applications over others (see client throttling above).
Monitoring – Managing Through Real-Time and Historical Reports
Denodo offers a wide range of real-time monitoring solutions where the user can measure response times offered to its clients, validating the effect of any applied resource throttling mechanism, caching strategies, and so on… and based on that tweak the parameters above. Also a historical view of usage, performance, and service levels can help plan future growth of data virtualization and enterprise data hub strategies.
Combining Denodo Capabilities with Other Tools for Adaptive Management
All of the Denodo capabilities mentioned before can be adopted as the need arises without impacting data consumers view of information. In addition these capabilities can be exposed via APIs to various systems – security and policy management, metadata and API management, operations monitoring, etc. – so that clients can adopt a unified and adaptive approach to scaling data virtualization in their enterprise.
Denodo’s approach to data virtualization does not lock you into a specific data management architecture through an all or nothing design. Different from the integrated data warehousing or SOA projects of the past with massive scope, complexity and high risk of failure, DV projects start with minimal overhead and evolve from tactical to strategic solutions with much lower resource footprints.
In summary, look for data virtualization performance and source system impact, along with ease of implementation as critical differentiators for the best data virtualization platform. Because data virtualization involves new paradigms and agile data integration strategies, customers should not assume that a vendor with high performance or long list of features in one realm (such as databases, network switches, or ETL) will automatically offer that in data virtualization.
Hopefully the contents of this article will encourage you to ask the complex performance questions and demand complete answers.
Update: make sure you also read the first part of this content. Data Virtualization Performance and Source System Impact (Part 1)
- Data Governance in a Data Mesh or Data Fabric Architecture - December 21, 2023
- Moving to the Cloud, or a Hybrid-Cloud Scenario: How can the Denodo Platform Help? - November 23, 2023
- Logical Data Management and Data Mesh - July 20, 2023