Some might think that data warehouses are outdated and unnecessary, because today you can store large volumes of data quickly and easily in a data lake or a data reservoir. I think this is a bit short-sighted. How can you leverage your data without a holistic and unified view of your information? How can you combine this data with data from other sources? How would you leverage your data for useful business analytics and insight? You don’t have to lose the resources you invested in your data warehouse. Instead, maximize your data by investing in flexible techniques such as data virtualization.
Long Live the Data Warehouse!
Those who shout that the data warehouse is dead are usually those who use Hadoop, NoSQL, or “NewSQL.” And I admit, Hadoop, for instance, definitely does some things better and faster than a data warehouse, such as processing large volumes and archiving and accessing with speed. But this doesn’t mean that the data warehouse is irrelevant; consider such features as enrichment, archiving, and exploration. You don’t have to disregard the value you have invested in your data warehouse (such as the business model, the logic, and the knowledge). Instead, complement this with data virtualization!
Hybrid = Flexible
Data warehousing is neutral in relation to the underlying sources, formats, and schedules in the data architecture. So too is data virtualization, and it offers the flexibility that the data warehouse lacks. Supplying one integral information model is much easier with a data virtualization platform than combining or building that up in a data warehouse. With data virtualization, you increase your options and capabilities. Additionally, a hybrid data architecture allows you to combine Hadoop, NoSQL, the existing data warehouse, external files, and batches of real-time processing into a single logical data warehouse.
SQL-on-Hadoop Engines
Combining the power of the data warehouse with new technology is becoming increasingly easier in modern information architectures. It is also possible, and above all practical, to apply a relational model on Hadoop and NoSQL platforms. It is precisely the development of SQL-on-Hadoop engines (such as Cloudera Impala and others) that contributes to the accessibility and usability of these platforms. Read Rick van der Lans’ post about the promise of “SQL-on-Everything.”
Where to Start?
A data warehouse is never complete, and neither is a logical data warehouse. So opt for an organic approach, so you can keep on adding, step-by-step. Don’t start too big, and don’t try to do everything at once. Set priorities, define increments, and deliver direct value to end users by delivering in short bursts. Choose a growth model in which the architecture of the logical data warehouse is the framework for your new developments, and one that makes it an organic process to add a big data source.
This blog was penned by Jonathan Wisgerhof, Senior Architect, Kadenza
- How Much Time Could Your Company Save If You Said Goodbye to Data Migration? - January 30, 2019
- Get Ready for the General Data Protection Regulation (GDPR), with Data Virtualization - May 24, 2018
- Data Virtualization is a Revenue Generator - September 20, 2017
I love reading through a post that can make people think. Also, thank you for permitting me to comment.
I’ve recently started a blog, the info you provide on this site has helped me greatly. Thanks for all of your time & work. “A physicist is an atom’s way of knowing about atoms.” by George Wald.