The Intersection of Big Data and Data Virtualization

Real-world big data use cases have moved beyond “possibilities” to multiple use patterns including analytics and low-cost archival storage of large datasets.  In all cases, big data is thought of in an ENTERPRISE sense, even if big data analytics that is performed by data scientists is highly focused on a particular entity or problem such as customer behavior, sensor data or trade patterns.  Today, data virtualization is playing a key role in exposing big data as abstracted data services using RESTful interfaces and real-time query capabilities to democratize access to this intelligence, and as such has enabled faster enterprise adoption of big data alongside traditional databases, operational systems and BI systems.

As more data sources enter big data stores/processing  (i.e. such as social media, sentiment, click stream, weblogs, transactional data, data.gov web sites, web data, machine data, etc.) enabling a data services interface with security and governance through data virtualization will broaden the usage of big data for analytical and operational uses.
After all, typical business drivers for big data relate to everyday business processes and decision makers  who are trying to gain more customer insight, identify new market opportunities, stop revenue leakage or fraud, enable mass personalization of products, improve process efficiencies, reduce costs, etc. based on processing large amounts of available data.   Yet this is not being done today due to high costs of storage & processing power needed. Big Data technologies have momentum because they can now do this at very low relative cost compared to databases and data warehouses.
So, when companies think about big data initiatives at a technical level, they mainly have two objectives in their minds:
  • Run analytics on large volumes, velocity, or variety of data using distributed storage infrastructure and distributed scan-based processing instead of the typical consolidated data warehouses running queries.  Sometimes the use of big data systems is just to store large volumes of data  comprised of log files, call archives, etc. cheaply without incurring database costs.
  • Build a layer of abstraction on top of the (big) data infrastructure that usually includes Hadoop or big file systems in order to offer their developers a much easier and/or flexible platform to create new enterprise applications based on big data use.
Currently, much focus is on the former  and involves how to capture, store and retain big data and improve processing.   In particular there is focus on Hadoop and some NoSQL and on big data offerings in the cloud.  While each model or technology offers advantages, they also have drawbacks in terms of enterprise class features, security, limitations of MapReduce paradigm to support real-time, query-based interactions, etc.  Vendors and technologies will compete to alleviate these limitations and improve storage and processing of big data over time. But that is not enough.
When you refer back to the business drivers of big data it is clear that the everyday business processes and decision makers involved in them need access to the results of big data analytics in a simple, integrated fashion.  This is the second objective to leverage big data enterprise-wide and not become another data silo.  Data virtualization is therefore a critical part of the big data solution.  It facilitates and improves the use of big data in the enterprise by:
1. Abstracting semi-and unstructured big data into relational-like views
2. Integration with other enterprise sources
3. Adding real time query capabilities to big data
4. Providing full support for RESTful web services and linked data
5. Adding security and other governance capabilities to the big data infrastructure
6. Helping to solve the siloed data/applications problem through a unified data layer
There is no doubt that the big data trend provides a huge opportunity to gain new and valuable business insights from large volumes of data that were earlier unavailable or uneconomical to capture.  One part of the big data technology platform focuses on finding the most cost-efficient and scalable ways to store and process the big data.  The other part of the big data platform raises the level of abstraction of big data, enables it to be easily discovered and queried as linked data services for use across enterprise-wide application development.  Data virtualization which provides this abstraction, real-time query and data services capability is thus an essential part of every big data platform.
Go to source: Data Center Post
Suresh Chandrasekaran

Leave a Reply

Your email address will not be published. Required fields are marked *