We live in the world of data, which is “ahem” a few zettabytes of nuggets floating around us constantly with growth every second. This data flow has increased our appetite for accessing information whether it is connected or not connected to our business, as it has information on the competition, the market, the sentiments of consumers, the trends, the geographies and their influence, the global ecosystem, and our immense availability of platforms, analytics and reporting tools.
All of these together have harnessed a capability matrix for all enterprises whether big or small to avail data and information-driven benefits. The foundational problem that we see here is, in order to become data-driven, a transformation has to occur in the enterprise, and this transformation needs to be driven by the business.
How will business drive transformation if they cannot see the data first? The business teams need access to the data to see how it can be used to deliver transformation with analytics and bring the business immense benefits.
In the world of big data platforms, we have tools including Apache NiFi and Alache MiniFi for data acquisition from any source to be ingested and delivered into HDFS or MapR-FS or any other fileserver platforms. There are also tools from the ETL stable including Informatica, Microsoft SSIS, IBM Infosphere and others who have created a niche Hadoop or NoSQL ingestion connectors. But all these strategies lead to one destination, we will acquire data and then it is available for use. That process will not work however in a world that moves every second, where every second is money lost if not monetized.
How to kick start your business transformation
In 2009, Mark Beyer, Distinguished Analyst and Research VP at Gartner, Inc. stated that we need to move from a static model of a data warehouse to a dynamic and expressive model and called it the logical data warehouse.
The logical data warehouse can be implemented with a data virtualization platform connecting to each source of data wherever it is, accessing the data, providing a metadata model of integration in the virtualization layer. The biggest advantage of this model is your business team’s capability to be informed and become proactive with their execution. Take the example of us as a customer shopping online for birthday gifts, it is very important to ensure that all gifts are delivered before the birthday in question and being a “premium” customer, your expectation is to have the service delivered and confirmed.
Let us assume all goes well and the gifts are delivered, everybody is happy. Now for a minute let us examine a different scenario, two of the five gifts are reporting to be on a backorder within an hour of placing the order. First, we as the customer get an alert and are presented options of alternatives that are available, assume we make a choice the order continues and is placed. Now let us assume we cancel the entire order and ask for refunds which are processed, but the net loss of an entire order is something that at any level of business we do not want to see. How can this situation be resolved with a different platform approach?
Well if the entire data system is a plug and play model, the data virtualization platform as the central data integration layer can help in this situation. We can harness API interfaces between the data layers and the application layers, exchange data as it arrives into a metadata layer and feed it into the analytics model, we can predict if there are issues in the order, and proactively engage the customer in a conversation to help assist with the call. The next possible outcome is the replacement items being accepted to the worst possible outcome of a partial order.
What you have now accomplished is a very real-time customer service experience; and this type of data integration is now feasible with data virtualization. Imagine a retail e-commerce company that can deliver the value of an Amazon with the user friendliness of your local farmers market shop, this is the business benefit that can be realized in this journey.
In my most recent experience with a customer who wanted to leverage all data across all ecosystems, we suggested the usage of data virtualization and in a span of six weeks the business teams were ecstatic about the proof of concept and its outcomes. The dream of a data-driven world with the components of instant reaction and end state customer gratification is now a reality.
Are we ready to get going and explore the data world and swim the stream of intelligence? Try your journey process and share the six-week results with the world.
- The Key to Becoming Data-Driven in Six Weeks - February 27, 2018
Feeding into an analytical layer??? How real time would it be and how performant to predict? Not sure how Data Virtualization fits here. Can you please explain this point alittle in details. According to me the first place the information goes would be the ODS layer on top of which if you can enable streams to process the data as it comes for further analytics using right tool. How does DV fits in would be an interesting to know.
In the current state of processing data, your perspective fits perfectly fine in data acquisition to ODS and further processing. This is our traditional data management process for the analytical systems. In the new realm of thinking and processing data, we are transforming from our traditional methods to new methods, the change being the ability to remove an ODS layer and replacing it with a layer providing access to all data from the source, an operational data swamp or data lake. In this process, we have now got platforms like Denodo, Hadoop and NoSQL which will be useful to accomplish this goal. For any large and complex environment, the data virtualization model of integrating all the sources and creating the data lake or data swamp is relatively easier as it does not involve too many changes at the same time. If this data layer is provided for all exploration and usage, it removes the “wait” from the end user perspective, and in the analytical user groups the wait period is even longer which now can be managed and you can access data at the level of granularity that you will need. In the processing of data from this layer we can add streams, micro-batches and more delivery techniques with API’s. The ease of data availability first will be the benefit that we are talking about in this blog.
Hi Krish, thanks for the article. When using data virtualization, how much work is used to interconnect or conform common data objects? Do you present the analysts with a data model? If so, what kind of model is it? How about an EIM to keep this information tracked and organized? Seems to me that if an effort uses DV to connect to sources with rules and predicates and no organization, then you have a bunch of one off tables that never get reused and a lot of confusion.