When creating an enterprise data layer using data virtualization some of the questions that usually came up are, what are the options for metadata management and how to provide data governance now that everything is accessed from a new unified layer. First let’s differentiate both concepts although both should be combined for an effective approach:
- The metadata describes and gives information about every data object.
- One of the data governance definitions is “a control that ensures that the data entry by an operations team member or by an automated process meets precisely standards, much as a business rule, a data definition and data integrity constraints in the data model.”
This post will describe the different features that a data virtualization layer should provide to comply with the concept of data governance without forgetting the metadata management capabilities that make it possible. We will classify this capabilities in the same areas defined by Forrester in its “Forrester Wave: Data Governance Tools”.
- Metadata Management.
- Security.
- MDM/reference.
- Data quality.
Metadata Management
A data virtualization layer should provide full metadata management capabilities from the data source import stage to the publishing stage. In the case of Denodo, those metadata management capabilities include:
- Import Metadata. Graphical process guided by different wizards depending on the type of source. Once the connectivity parameters are provided Denodo will introspect, for the elements selected by the user, not only the field names and types but also other information such as the indexes available, primary/foreign keys and certain profiling and statistics that will help the Denodo Optimizer.
- Automatic Metadata Documentation. Every element also includes information about the owner of the element, last modifier, a description and the creation/modification dates. This documentation allows to create a business glossary based on the descriptions and it is possible to automatically create a document containing all this metadata.
- Metadata catalog and self service Global Search. The metadata catalog can be accessed from the Denodo Platform Administration Tool but also via API. This catalog allows to search the data or metadata by any filter such as description, owner, type… The Denodo Information Self-Service Tool provides an additional functionality named Global Search that searches not only in the metadata but also in the previously indexed data using a keyword type search.
Business users benefit of this feature as it also includes discovery capabilities. Once the element is found by a keyword search the data can be drill down using the different associations between the views.
This is a really important benefit in your enterprise data layer as it allows your business users to discover their available data from an intuitive web based tool.
- Data Lineage. Lineage is available from data sources to all dependent business views and from business views to dependent data sources can be displayed graphically in an interactive tree format in the Denodo Administration and Development Client Tool in addition to a web browser using Denodo’ s Information Self-Service Tool. The Lineage includes not only the source of the data but also any transformation that was applied before the data is delivered.
- Source Refresh and Impact Analysis. Denodo keeps track of dependencies and schemas so that any changes to the underlying schemas at the base view or any point in the tree will invoke an impact analysis. The impact analysis will inform the user which upper-level views will be affected and how. In some instances, such as the addition of a new field to the underlying schema, impact analysis will ask the user if the field should be propagated to upper-level views. The user can choose how far up the tree to propagate. In other instances, such as when a field is removed, the impact analysis will identify which upper-level views will become corrupted. If changes are committed, the upper-level views will turn red and the GUI will guide the user how to correct the issue. The checks on source changes can be scheduled for automatic execution (e.g. daily/weekly) and will send an email alert to the owner of the solution.
- Integration with Metadata Tools – Top Down modeling. It is possible to define the models of your enterprise data layer in an external tool such as Erwin or ER/Studio and import those models and relationships as Denodo views.
- Integration with Version Control System (VCS). In order to manage all the deployment lifecycle stages (e.g. development, staging, production) and keep track of all the metadata changes the Denodo platform integrates with a version control systems such as subversion, MS Team Foundation Server or Git.
Security
A data virtualization platform should include low granularity security options to add to your enterprise data layer so access is secured and it can be audited at any point. Denodo adds the following features to your enterprise data layer complying with the data governance control:
- Unified security management offering a single point to control the access to any piece of information.
- Role based security access with a low granularity level. It is possible to assign privileges to a schema level, element, column, row or even adding column masking to certain values based on filters.
- Authentication credentials can be stored either internally in Denodo Platform in a built-in repository or externally in a corporate entitlements server such as an LDAP / AD repository. Also it is possible to use ‘custom policies’ to connect to any other corporate security system.
- Standard security protocols used to connect to data sources (X509v3, SSL, WS-Security, OAuth…) or to publish data to consuming applications (SSL, HTTPS, Kerberos…).
- The Denodo Platform provides an audit trail of all the information about the queries and other actions executed on the system. Denodo will generate an event for each executed sentence that causes any change in the Denodo Catalog (store of all the server metadata). With this information it is possible to check who has accessed specific resources, what changes have been made or what queries have been executed.
Data Quality and MDM/Reference
In order to ensure that the data delivered by your enterprise data layer is correct, your data virtualization platform should include validations of data and on-the-fly transformations in an agile and flexible way. All your applications will benefit of this unique point of access as all the validations will apply to any application connecting to your enterprise data layer.
The Denodo Platform enables a set of best practices to deal with incorrect data:
- Filtering incorrect rows/values after applying data validation logic, so the data consumer only receives rows with correct data (although not all the rows in the data source).
- Flagging incorrect values. This is done by adding extra columns indicating that particular values are wrong with a flag on those columns.
- Restoring. With this approach, the originally incorrect data is replaced by correct values through some transformation logic.
Denodo provides a vast library of transformation, filter and matching functions and quality rules for validating, cleansing, enriching, standardizing, matching and merging data such as conditional processing, partitioning, fuzzy match algorithms for de-duplication and cleansing as well as syntax-based, thesaurus-like or semantic mappings, and advanced view combinations such as MINUS and INTERSECT.
Additionally, the Denodo Platform can easily be extended to make use of external tools such as DataFlux, Trillium, Lingpipe, Schober and Google Geocoding. That means that you can reuse any tool that you already own with your enterprise data layer.
Your enterprise data layer can also benefit from the virtual MDM capabilities provided by the Denodo Platform that provides the following advantages against traditional MDM solutions:
- Access to any type of source: structured, unstructured, internal, external…
- Virtual Golden Record: by applying transformations and matching rules in real time.
- Prototyping of MDM: minimizing the implementation effort, data discovery, requirement gathering…
Conclusion
If you already created an enterprise data layer using data virtualization or you have an initiative to research the different options take into account the features described above as they are crucial for the success of your metadata management and data governance strategy in your enterprise data layer.
The summary of benefits that your enterprise data layer will be able to take advantage from are:
- Flexibility and agility in the metadata management lifecycle: importing, managing, publishing.
- Definition of a unified point of access where centralized security policies can be applied and audited.
- Centralized data quality validations to ensure that the data delivered is correct for every client application.
- High level of data and metadata integration, enabling data and relationships discovery without worrying about the data sources types involved: structured, semi-structured, unstructured.
What other features or approaches do you use for data governance in your enterprise data layer?
- Enhancing the Security of your Enterprise Data Layer - December 21, 2016
- Metadata and Data Governance for your Enterprise Data Layer - October 5, 2016
- Managing Data Virtualization Deployments - March 31, 2016