Big Data is not always Big Information
Organizations don’t need big data, they need big information. Let me explain. Years ago, my professors taught me the critical difference between data and information. Data are raw unorganized facts that need to be processed, and when they are processed, organized, structured, and presented, they become information. If it’s compared to a manufacturing process, then data is like raw materials and information is the final product. In the same way that raw materials are needed to create final products, so data is needed to create information, and it’s information that organizations are interested in.
Examples of data are all the records containing sensor data, customer data, invoice data, factory data, social media data, and so on. When data is processed and structured, it may become information, but only if it’s useful to the users and it’s only useful when it allows them to improve a business process, when it shows the risk percentage that a patient will get diabetes within a few years, when it improves customer happiness, when it avoids machine failures, or when it shows that many customers are complaining about products on social media networks. In other words, information may allow users to act.
In the world of big data, it’s all about more data. The potential advantages of storing more data is clear: more information can potentially be derived and it can potentially lead to more analytical opportunities. However, more data is no guarantee that more information can be derived. Statisticians know this. The size of the sample is important, but it’s much more about the quality and the representativeness of the data.
The risk with many big data systems that are enthusiastically being developed today is that they will not deliver more information and will not lead to more analytical capabilities. It’s possible, but, as indicated, there is no guarantee. It’s not about raw materials, it’s about final products. Therefore, we recommend that when big data systems are being developed, that the focus is on big information. Organizations should know what they want to do with the raw data.
- Metadata, the Neglected Stepchild of IT - December 8, 2022
- Zero-Copy Architectures Are the Future - October 27, 2022
- The Data Lakehouse: Blending Data Warehouses and Data Lakes - April 21, 2022