Geospatial data and the data virtualization world
Geospatial data is information about a physical entity that can be represented by a set of values in a coordinate reference system. This kind of data is today very commonly used for enriching sets of otherwise non-spatial data with useful representations of locations, sizes or shapes that can be spatially or geographically associated to them. Use cases range from geolocation of a chain’s stores in a map to geographical identification of transport lines or energy resources.
Data virtualization, as a key technology in the data integration field, can play an important role in the management and publishing of geospatial data. Geospatial data can flow through the data virtualization system from the sources where it is stored, can be integrated with data from other sources in order to be offer a richer, geographically-enabled version of the latter, or can even be created as such geospatial data at the data virtualization layer from its component values and metadata, potentially coming from diverse and heterogeneous sources.
From there, the data virtualization layer can provide its client systems with the sets of integrated geospatial data that can suit their needs, allowing them to use this data in a variety of geographical computations or rich visualization mechanisms.
The challenges of geospatial data
Handling geospatial data poses a series of challenges, the most common of which is an increase in complexity that appears when this data needs to be not only formatted or reshaped, but also computed or operated with in some way. Many functions for performing computations with geospatial data are already commonly available and well documented, but there is a feature inherent to geospatial data that can make these computations considerably more complex: Coordinate Reference Systems (CRS).
Coordinate Reference Systems (CRS) or Spatial Reference Systems (SRS) define map projections, and thus give a true geographical value to coordinate values in a geospatial piece of data or geometry. They are needed to make real-world sense of values set in geometry coordinates, but they add complexity because of the disparity of possible map projections. Disparity that originates from the diverse available options for translating from sets of 2D coordinates into points on Earth’s (or other planet’s) not-flat-but-also-not-spherical surface.
Of course several CRSs exist, each based on different assumptions, useful for different scenarios and with different constraints and levels of precision. So computing geospatial data can become difficult when data can come expressed using different CRSs or when our intention is to obtain figures that heavily depend on the map projection being used itself, such as orthodromic distances. Normalizing and converting to and from well known coordinate systems such as WGS84 or Universal Traverse Mercator (UTM) adds complexity to the overall operation, but at the same time increases the real-world usefulness of a geospatial data handling package.
Common geospatial functions
There is a good amount of functions that a geospatial-handling package can offer, which can be grouped in three main categories:
- Creation of geometry data (usually in Well-Known-Text (WKT) or Well-Known-Binary (WKB) format): points, lines, polygons.
- Obtention of geometry data components: retrieval of the coordinate values of geometries, number of dimensions, nested geometries (in complex ones), etc.
- Operation, the largest and most complex group, both returning other geometries as a result —like union, intersection, buffer, ring, centroid…— or returning scalar data —- such as area, length or orthodromic distances.
Also, in order to properly operate with data potentially expressed in diverse Coordinate Reference Systems, functions able to transform geometries among the different CRSs will also be needed and/or transparently applied underneath in order to normalize geometries to the reference systems required for correctly operating with them.
Interfacing with client applications: GeoJSON
Finally, geospatial data handled and/or computed at a data integration system such as a data virtualization platform can be offered to its client applications using not only the usual formats and protocols such as JDBC, ODBC, REST, OData, etc. but also using the GeoJSON standard format (RFC7946), which enables greater compatibility with existing map visualization systems:
- Zero-Code GraphQL API Creation with the Denodo Platform - May 13, 2020
- Using Data Modeling Tools for Top-Down Data Virtualization Design - January 3, 2018
- Supporting Geospatial Data in Data Virtualization Environments - September 27, 2017