How does data virtualization guarantee quality of service in your business apps? This is something I am frequently asked, so I’ve decided to provide some insights.
Data virtualization provides an abstraction layer over disparate heterogeneous data sources and offers a unified point to access, extract and combine information, serving both operational (transactional) and informational (BI reporting/analytics) applications in the enterprise.
When deployed as a common data layer at an enterprise scale, data virtualization supports multiple business applications, each of which has its own QoS requirements for data volumes, concurrency and latency.
We can distinguish between two main types of workloads with different QoS needs:
- Informational workload: this workload is common in reporting and analytical applications where data virtualization is used as a semantic layer and is characterized by the need to handle high data volumes, usually requiring a high level of summarization, with medium to large dataset sizes and low to medium concurrency levels. There is normally no need for a low latency as users are not typically on-line waiting for results.
- Operational/Transactional workload: this scenario is typical of operational applications (e.g. single customer view app.) and is associated with high concurrency levels and low latency requirements with queries handling small to medium dataset sizes (although data sources can be large, queries have high selectivity returning smaller data sets). Latency times are critical in this case as the user/operational process needs an immediate response.
In an enterprise-wide deployment of data virtualization, both workloads will be in place as the data virtualization layer will be used across many different use cases, integrating multiple sources and providing service to multiple corporate applications.
In order to guarantee QoS requirements for every application, we have to first assign enough computing resources, which will come from a proper sizing of the solution. It is important to ensure that no application exceeds the assigned resources as this could potentially hinder other applications’ QoS guarantees. This approach is similar to that of operating systems which enforce the resource consumption by any process to guarantee proper application isolation and no interferences between them.
Advanced data virtualization tools, such as the Denodo Platform, provide mechanisms to achieve isolation. The Denodo Platform has the capability to specify workload management policies, either:
- Globally, for all the applications running on the server,
- On a per-view basis, or
- On a per-session basis, classifying the server sessions in groups, depending on the user role or the application where the query originated.
Workload management policies can be defined at design time and adjusted dynamically based on server or data source workloads. Some of the policies that can be defined by the user include:
- Maximum memory consumption for a given query. Data retrieval from the source would temporarily pause if memory consumption exceeded the established limit, resuming once the memory has been freed. Alternatively in this case, you could enable activation of disk swapping, which would move data to the secondary storage.
- Maximum number of concurrent requests from a session group (i.e. users with a given role or from certain business applications), forcing the rest of the queries exceeding the limit to be queued for later processing.
- Prioritize execution of session group threads over others. For instance, in order to simultaneously execute a transactional and an analytical workload, you could assign a higher priority to transactional sessions that require lower latency, in turn avoiding interferences from the informational workload.
- Limit maximum execution time for a query or the maximum number of rows it can receive to stop it in case it surpasses the limits.
Dynamically, the data virtualization server can apply intelligent workload management techniques, denying or allowing queries according to the current server or data source load, by monitoring the server performance in real-time.
Denodo policy-based workload management allows the enforcement of custom policies, such as rejecting queries from certain users at certain times of day and allowing them at others.
If desired, all statically defined policies can be applied if the server is highly loaded.
The use of all mentioned capabilities combined will guarantee that QoS requirements of critical business applications are met.
- ChatGPT and Data Fabric are Streamlining the Field of Business Data - June 8, 2023
- Managing Unprecedented Change with a Modernized BI Infrastructure - November 23, 2017
- The Denodo Platform is “SAP NetWeaver” Certified: What this Means for SAP Customers - January 13, 2016