Most organizations are moving their IT systems to the cloud. In most cases, they are performing these migrations to increase the scalability of both processing and storage, and generally to free the organization from the limitations of on-premises systems. However, most organizations have not been leveraging the full capabilities of cloud platforms.
Even if we know that our systems will run on a cloud platform, we still design our data architectures as if these systems will run on an on-premises infrastructure. Yet cloud platforms enable us to simplify our data architectures in many ways. In this post, I’ll describe how file transfer can be simplified.
File Transfer in the On-Premises World
Consider all the current operational file transfer solutions. The number of files that are transmitted this way between different organizations is staggering. Files are transmitted batch-wise to suppliers, regulatory organizations, agents, and so on, and this requires work to be done by both the sending and receiving organizations. The sender must develop, maintain, manage, and schedule programs involved in creating and sending the files. The receiver, in turn, is responsible for developing, maintaining, managing, and scheduling programs involved in processing and storing the data. In addition to these drawbacks, the data that organizations receive has a high data latency, and the copied files must also be protected.
From File Transfer to Data-On-Demand
When cloud platforms are used, file transfer solutions can be replaced by much simpler data-on-demand solutions. The data to be transmitted would be stored in, for example, a SQL database on a cloud platform, and organizations would be granted the proper rights to access that data live. This data-on-demand approach implies that the data is stored once but used many times. It would make the data easier to use, because receivers would not have to copy the data to their own systems. The data latency would be greatly lowered, and there would be no need to develop programs to create and transmit files.
Data Marketplaces using Data-On-Demand
An interesting example of this data-on-demand approach is Snowflake’s data marketplace offering. Commercial data marketplaces, such as Bloomberg and FactSet, have made data available to customers by storing it in the Snowflake data marketplace and enabling their customers to access that data on-demand. Technically, this means that their data is stored in tables of the Snowflake database running in the cloud. Normally, such data marketplaces would use file transfer to get their data to their customers. The latter would then need to transform that data and load it into their own systems so that it can be integrated with their own data. But with the Snowflake solution, all that work is no longer required. Customers can simply send SQL queries to the data marketplace in the same way that they can send queries to their own cloud-based or on-premises SQL databases. These data marketplaces have moved from file transfer, with all its drawbacks, to data-on-demand; they are really leveraging the full capabilities of cloud platforms.
Dealing With Scalability
Some may be concerned that the workload generated by all these data-on-demand users can be highly unpredictable, with the occasional high-activity peaks with respect to the number of concurrent users, the number of concurrent queries, the query complexity, and the timing of queries. If an organization wants to offer data-on-demand on on-premises databases, it must ensure that there is sufficient processing capacity to handle that workload, including the unpredictable peaks. This means that the organization may need to install an overcapacity of processing power. Unfortunately, most of the time, overcapacity of this nature is irrational.
In contrast, by migrating data and systems to the cloud and selecting the right software, the solution would offer automatic up and down scaling and would therefore be able to deal with unpredictable workloads rather elegantly.
Another concern that some might have with this data-on-demand approach is that when users access data stored on someone else’s cloud platform, performance would be slow. However, that does not have to be the case. There should be no difference between accessing internal data stored on their own cloud platform and accessing data on a cloud platform used by another organization.
Get Creative with Cloud Platforms
Many still regard cloud platforms as a replacement for their own machines, and they continue to design cloud systems as if they are still running on-premises infrastructures. I’ve described just one example in which cloud platforms enable the simplification of data architectures and create new opportunities, but there are many, many more. Cloud technology is much more than scalability and unburdening. If you are migrating to the cloud, get creative!
- Metadata, the Neglected Stepchild of IT - December 8, 2022
- Zero-Copy Architectures Are the Future - October 27, 2022
- The Data Lakehouse: Blending Data Warehouses and Data Lakes - April 21, 2022