Is Data Virtualization the Rosetta Stone for Data Representation?
Reading Time: 4 minutes

A deliberately rhetorical question, given that my position in this regard is that of a firm and convinced yes, recognizing in data virtualization the ideal place to realize the syntactic transformation, semantically invariant, which allows sharing and disseminating the company’s information assets, according to that spirit of democratization of data, necessary to allow anyone who needs to use it, to do it with simplicity and speed, reducing or canceling any interpretative effort necessary to understand what data are available, what they represent and how they can be accessed.

A good solution of data virtualization, in fact, is a place of translation, where the plethora of available data, in their meaning and their occurrences, has the opportunity to filter between the meshes of a syntactic normalizer, which preserves the meaning – and it could not be otherwise – and presents them to potential users, be they human or applications, so that they can make the best use.

The original syntax, therefore, changes and bends to an undeniable need for maximum sharing and understanding, becoming a format that is reasonably widespread – often the relational or tabular – given that the more the data is interpretatively accessible, the more the intrinsic value in the data can be expressed.

A data virtualization solution, therefore, allows to represent the same meaning in different forms, without any informative loss, allowing those who do not know the original forms to access the meaning conveyed by what with these forms is represented and, as a further and not secondary effect, to understand, where it is desired, even these initial forms, since, as a result of this translation, we have a unique meaning and more forms that represent it, which, with a certain narrative simplification, brings us back to the contribution that gave the Rosetta Stone to the difficult task of understanding the Egyptian hieroglyphs, which were finally accessible by virtue of the presence, in the Stele, of the same text represented in Egyptian hieroglyphic, Egyptian demotic and ancient Greek[1].

This characteristic, often underestimated, if not worse ignored, is instead fundamental, given that the professed ability to simplify access to data should not be read only as a technical element, but rather according to an overall interpretation, where the simplification also extends to easier access to their meaning, which does not require their users to become familiar with formats (syntax) unknown to them for the sole purpose of understanding what they otherwise could not understand. We could also say, recalling what John Searle tells us[2], that if the deep meaning[3] of data is ontologically subjective, in the sense that it cannot exist without a perceiving subject, we should at least try to make its superficial meaning[4]epistemically objective, by giving it a representation that is prescriptive, which then represents how that data comes to life within the company and how it is functional to its business model. To give an example, in the hope of clarifying the point, we can create, starting from the elementary data, an aggregate datum (let’s call it concept, for clarity) that combines some elementary data in a specific way, symptomatic of how this concept is experienced inside the company; the representation of this concept, in the adopted formalism, will then be its superficial meaning, which will be epistemically objective as definitory (it is definitory in the sense that the company imposes its adoption). However, the deep meaning of this concept, or the way in which it will become part of everyday life, will also depend on the individual subjects, who will contextualize it in what he is asked to do and that he will do, making it ontologically subjective, realizing that intentionality, which is always individual and “here and now”.

These two capabilities, the technical and the representative, are fundamental, not separable if not at risk of having an incomplete solution, and if the first makes the second possible, it is equally true that the first, without the second, leaves something unfinished, not allowing to release all the power inherent in the data.

By taking for granted the technical capability, which guarantees the actual reachability of the data, whatever their format and wherever they are stored, what we must look for, or better to expect, in a data virtualization solution that can be defined as such, is its ability to translate data in a unique and easily understandable format, realizing, as precisely made possible to the Rosetta Stone, that correspondence between a single superficial meaning and its syntactically different representations. If we omitted this component, if we thought of a data virtualization solution exclusively as something that confines itself to separating the physical aspects from the logical ones (or the intensional from the extensional ones, if we prefer), we would lose that fundamental role, which we could call harmonizer of signifiers, which gives simplicity to the use of such data, a use that, before anything else, passes through a clear and unambiguous understanding of its meaning, which in turn get benefit from a representation in a language able to combine expressiveness and simplicity, and that is also reasonably simple to learn, as if it were the Esperanto[5] language for data.


[1] The simplification here is obvious, since the Rosetta Stone allowed access to a language, that of hieroglyphs, which at that time was not known to anyone, while in our scenario we speak of formats known by many (for example, who originally defined the data) but not by all (for example, analysts who are interested in their meaning rather than their format). However, here we just want to highlight how a data virtualization solution can also fulfill the task of making accessible an unknown syntax through its meaning and the link it has with a further, and this time known, syntax.

[2] John Searle – “Philosophy in a New Century: Selected Essays” – 2008

[3] With the term “deep meaning” I do not mean the representation of the semantics of the data, whatever the formalism adopted, but the meaning that each of us attribute to it according to our intentionality, thus assuming that “meaning is use”, as Wittgenstein tells us in his ” Philosophical Research”.

[4] With the term “superficial meaning”, on the contrary, I mean what is normally called metadata, i.e. a representation of it in terms of the characteristics of the data itself and its relations with other data. Even the superficial meaning, however, is not necessarily unique within a company and, for example, may differ slightly depending on the Business Unit it has to deal with it.

[5] Esperanto is a constructed auxiliary language, created by Ludovic Lazarus Zamenhof, a Polish eye doctor. He created the language to make international communication easier, having also the goal was to design it in such a way that people can learn it much more easily than any other national language.

Andrea Zinno