Simplifying Big Data Projects with Data Virtualization
According to Gartner, 60% of all the big data projects fail and according to Capgemini 70% of the big data projects are not profitable. There can only be one conclusion, big data projects are hard! There is not one specific reason for this situation, but many. For example, some big data storage and processing technologies are complex to use, for most IT specialists the technology is new, and sometimes big data products are used for the wrong use case.
Despite these disappointing results, organizations still initiate big data projects in order to take advantage of the potential business benefits promised by the big data proponents.
Data virtualization can help to simplify big data projects. It won’t solve all the problems, but if deployed for the right use cases, it will definitely increase the likelihood that big data projects succeed. Here are some examples of proper use cases.
Massive amounts of big data are stored in plain files. This makes it hard for non-tech-savvy business users to access it. Data virtualization servers can hide this complexity and can make big data available to a larger business audience and to many popular BI tools. They do encapsulate big data through a simple virtual table that can easily be accessed.
In organizations operating internationally, big data may be produced remotely all over the globe. Data may be produced, for example, in factories, manufacturing plants, and stores. The amount of data each remote site produces may be too much to copy to a central location for reporting and analytical purposes. In other words, big data can be too big to move. Data virtualization allows virtual tables to be defined that hide the remoteness of the data. To business users it will look as if all the data is centrally stored. The data virtualization server pushes the processing to the remote sites, instead of transferring all that big data to a central point for processing.
In numerous projects big data is stored using high-end, transactional NoSQL products. Most of these products are designed and optimized for processing massive amounts of transactions. Unfortunately, their focus on transaction processing is at the expense of their analytical and reporting capabilities. With data virtualization the data stored in NoSQL products can easily be cached and moved to a fast analytical platform. This way, with a minimal effort, that transactional data can be made available for reporting and analytics easily and fast.
Not all big data is stored in systems that allows it to be documented and described. Therefore, technical and business metadata is often missing. A data virtualization server allows metadata, in the form of definitions, descriptions, and tags, to be defined for all kinds of data sources.
The last use case described here relates to the fast-evolving world of big data technology. To exploit every new technology, data consumers must be decoupled from the data stores/producers. For example, the more applications are tied to a specific data storage technology, the harder it will be to port that application to a new promising technology. With data virtualization, applications and reports can be developed independently of the data storage technology used. It will simplify a migration to a new and faster technology. Data virtualization makes organizations less dependent on the current hot technologies that may become out of date quickly in this fast-moving world.
Conclusion, organizations want to develop big data systems. For most of them it’s part of their digital transformation, and it’s essential for them to become more data driven. But practice has shown that developing big data systems is difficult. There is a high risk of failing. Data virtualization servers can’t solve all the problems, but there are definitely key areas where they can help to simplify big data projects and exploit an organization’s big data investment. Data virtualization can make big data easy to use.
Sihem Merah, Sales Engineer at Denodo and I presented the webinar entitled: “Réalisez enfin les promesses du Big Data grâce à la Data Virtualization” where we went into further depth into how data virtualization can make big data easier to use, I hope you find it helpful.
Latest posts by Rick F. van der Lans (see all)
- Data Virtualization and SnowflakeDB: A Powerful Combination - January 23, 2020
- Spark and Data Virtualization: Competitors or Cooperators? - October 24, 2019
- Comparing ETL with Data Virtualization Makes No Sense - May 2, 2019