While cleaning up our archive recently, I found an old article published in 1976 about data dictionary/directory systems (DD/DS). It was written by L. Delport. Nowadays, we no longer use the term DD/DS, but “data catalog” or simply “metadata system”. But at that time, DD/DS was the fashionable term used for systems to store, describe, and manage metadata.
Has Metadata Management Improved over the Years?
I decided to read the article, because I wondered how the state of metadata had improved in all these years. Bottom line, the article was a long lament about how poorly organizations managed their metadata. What struck me was that almost all the author’s criticisms still apply to many organizations today. So, my conclusion after reading the article was that metadata management hasn’t changed much in 46 years. (By the way, I still have the wicked plan to take that article, replace some terms by contemporary ones and republish it. I bet if do, most readers won’t notice that the text was originally written 46 years ago.)
How is it possible that metadata management hasn’t changed? Few will say that metadata is unimportant. In fact, most will argue the opposite. So why don’t we manage our metadata better and provide our organizations with it more wisely? In other words, why is metadata still the stepchild of IT? That is still a mystery for me.
My general recommendation is to design a metadata-architecture in the same way as we would design a data architecture. This would involve studying the requirements of the users and applications that need access to metadata. Let me describe a few examples.
Analyze Requirements
In the old days, metadata was mainly kept for IT specialists. They needed detailed descriptions of what all the files, tables, and columns meant. That need still exists, but there are new users interested in metadata. Today, business users — especially those who develop their own dashboards and create data science models — need access to metadata. They need to be able to see descriptions, categorizations, lineage, and so on. They need to know exactly what they are looking at. They must be able to search metadata to find the right data to develop a report or data science model. They also need metadata to clarify how to interpret the data. So, we need to analyze in detail what types of metadata business users need.
It is also important to study how they want to access the metadata. For example, for a business user working with a dashboard full of financial data, descriptions of what all those values mean should be readily available. And they may want to automatically see that metadata when they hover their mouses over a specific value for two seconds. They may not want to start a separate system to get those descriptions, because it would be inconvenient and would take too much time. It is important to know how users want to use metadata.
Metadata can also become a part of the operational systems. For example, if different security levels are assigned to different data objects, the relevant data privacy and security rules can refer to those levels. User groups without the proper credentials will not see columns with a high security level, for example.
Today, metadata must be managed properly for data privacy, security, auditability, and governability reasons, which may require robust forms of version control.
Business users may want to add metadata descriptions themselves, in the form of annotations. Again, to design the right metadata architecture, this would be an important requirement.
Also, metadata must be entered and extracted from other systems. With respect to extraction, an ETL-like solution may be required to extract metadata from source systems and schedule these extraction processes periodically, as metadata can change. Note that ETL for metadata may not simply extract metadata from some structured data source. It can mean extracting metadata descriptions from ETL programs, database stored procedures, and the semantic layers of reporting tools. The source for metadata-ETL is not always a database, but sometimes also code and specifications.
In Future Architectures, Metadata is Indispensable
All these requirements do not magically appear; they must be analyzed, designed, and developed. Architects should therefore be aware of them. The right tools are needed to make this possible, and metadata must be managed in such a way that it can be accessed properly.
In future data architectures, be it a data mesh, data fabric, data warehouse, or data lakehouse architecture, metadata is indispensable, especially for modern business users who develop their own dashboards and reports. They need frictionless access to metadata.
I know all this may sound like knocking down an open door, but then why is metadata still the neglected stepchild of IT, after more than 40 years? It shouldn’t be.
- Metadata, the Neglected Stepchild of IT - December 8, 2022
- Zero-Copy Architectures Are the Future - October 27, 2022
- The Data Lakehouse: Blending Data Warehouses and Data Lakes - April 21, 2022