Today, many applications call themselves “data catalogs.” The idea seems, on the face of it, easy to understand: a data catalog is simply a centralized inventory of the data assets within an organization. Data catalogs also seek to be the one-stop shop for discovering, contextualizing, and extracting value from enterprise data. Therefore, all data catalog products should be more or less the same. Right?
As it turns out, there are important differences between data catalogs. Different products address different challenges and can even be complementary within an organization’s data architecture. To grasp what’s different about the Denodo approach, it’s important to understand the difference between data catalogs that contextualize and document data (but don’t deliver data) and those that deliver data (but contextualize and document data only within the delivery scope).
Of Maps and Telephones
If you’re like me, you probably have several apps on your phone that display a map on the first page. If I open either Uber or Google Maps, I immediately know which street I’m on. However, beyond this superficial similarity, the two applications serve very different purposes. I use Uber to request a car to take me somewhere. The application, and the platform behind the application, uses a map to help me use that service more effectively.
Google Maps tells me both where I am and which services are nearby – which restaurants, shops, and museums are within a given radius, for example – and also can help me navigate to these services, on foot, by public transportation, or by car. However, Google does not run any of the restaurants, or operate public transportation, and can’t send a car to get me to my destination.
The difference between the Denodo data catalog and others, such as Collibra, Alation, or Data Galaxy, is similar: The Denodo Platform is an integration platform for data, and the Denodo data catalog is the “map” that helps users to understand where the data is and how the Denodo Platform can deliver it. In this sense, Denodo is like Uber. Other data catalogs are more similar to Google Maps: they show users where data is and how can be used, but they don’t deliver the data itself; they leave that task to other tools.
Linking Data Discovery, Data Modeling, and Data Delivery
To understand the difference from a technical standpoint, it is important to understand that what you see in the Denodo data catalog – the views, data services, and data sources – are exactly those that are modeled in the Denodo Platform: no more, no less.
In some ways, the Denodo data catalog is the “human” interface of the Denodo Platform, enabling business users to browse, discover, and use Denodo-modeled data assets. Once a view or a data service has been identified, the business user not only can understand the data via its documentation and lineage, but can also access the data itself, even opening the data directly in other tools such as Tableau or Power BI.
The positive impact of the Denodo data catalog grows as more views and data services are built with the Denodo Platform: more data assets become discoverable, and useful, to an ever-wider self-service audience. Ideally, the set of views and data services will eventually constitute a universal semantic layer, that delivers data from all of an organization’s data sources, ensures standardized calculations and transformations, and delivers the data in different formats and protocols.
Another advantage to this approach is that the Denodo data catalog is up-to-date by design: if the model changes – if new views are added, if transformation logic is modified, or if the underlying data sources are migrated – the representation of the model in the Denodo data catalog is updated automatically. This simplifies the tasks of data stewards and data engineers and helps them to collaborate more effectively.
Rationalizing the Data Landscape
Just as Google Maps and Uber are complementary and coexist on our telephones, the Denodo data catalog and other data catalogs are often complementary and co-deployed within an organization. Coexistence doesn’t mean that they should replicate each other’s work, however.
Take a close look at Uber and you may notice that the integrated map is actually provided by Google. Google has become a reference for geographic data, so it makes sense for an application like Uber to use its map as a trusted, up-to-date reference.
Similarly, the services that Google Maps direct us to come via external data from a variety of sources, including public transportation operators, airlines, hotels, and restaurants. Google doesn’t create this data, but it integrates the data from trusted sources.
External data catalogs and the Denodo data catalog can work the same way. Metadata about Denodo views and data services can be integrated into other data catalogs. This gives these outside tools visibility over what has been defined in the Denodo Platform, including the structure and sourcing (or lineage) of the data. If any of this metadata changes, the changes can be reflected automatically. Similarly, other metadata, such as view descriptions or tags, can be defined and managed in an outside tool and imported into Denodo. The goal is to implement metadata management once, only once, and in the right tool.
Defining Governance and Implementing Security
It is critical not just to deliver data but to deliver data securely. The best data catalogs on the market help define a security model, first by giving governance visibility to all of an organization’s metadata and data processes, and then by defining business rules to control access to the data. However, these tools do not implement the security rules themselves: They can’t control who accesses the data because they don’t actually provide access to the data in the first place.
This is where tight integration with a tool like the Denodo Platform can be critical. A good example of this is tag integration with Collibra. Tags can be defined in Collibra to describe, for example, the sensitive nature of certain columns in Denodo Platform views, indicating the columns that contain corporate confidential information, for example, or personal identifying information in a GDPR-regulated context. The Denodo Platform can read these tags from Collibra, and then associate them with appropriate masking and access restrictions that are applied to all data access requests from any tool, via any access protocol.
Where to start?
Once you’ve understood the difference between the Denodo data catalog and other data catalogs, your next questions may be: Do I need both? And which one should I deploy first in my organization’s data strategy?
To answer, you should first ask what your current data challenges are. Do you need to increase data self-service? Do you need to ensure standardized, reliable, and documented implementations of KPIs? Do you need to accelerate the adoption of a new data platform, such as a data lake or a move-to-the-cloud initiative? In short, do you need to connect users with reliable, easy-to-discover data, while ensuring that data can migrated or centralized on a new technology?
If you answered Yes to any of these questions, the Denodo data catalog is the perfect place to start. The first step is to build governed, standardized Denodo Platform views that combine data from existing data sources and implement KPIs according to validated business rules. These views are documented in the Denodo Platform as part of the modeling step, and then automatically made available to end users via the Denodo data catalog. Migrating to the new storage platform is subsequently much easier: Even when underlying data sources are changed, the views are updated without impacting end users.
If, on the other hand, your principal need is to document, govern, and define data, data processes and assets, then you may look to a traditional data catalog.
Remember that both may eventually be part of your strategy. The Denodo Platform’s role is to deliver the right data, in the right format, to the right people and applications, and the Denodo data catalog’s job is to make this easier.
- Choosing a Data Catalog: Data Map or Data Delivery App? - November 17, 2022
- GraphQL Made Easy: Building Flexible Data Access APIs - August 25, 2022
- API Data Access Made Easy: Connecting Web Services with the Denodo Platform - October 14, 2021
It’s crowded in the data management space, and the same term is often used with different meanings which is why I love this blog, it really helps the reader understand the value of Denodo and the value of Collibra et al. Thanks Emily !