Big data analytics has big promises to keep. But that’s not the responsibility of any particular technology or any particular company. Open source computing and its wide acceptance around the world gives us the hope that the right partnership models and right ecosystem of technology companies can unleash the power of big data.
If we walk back in history, mainframe computer was the solution to handle large amounts of data, the big data of the past. But as soon as phenomenon like bar code scanning became widely accepted and de facto standard around the world, technology industry had to come up with a solution like data warehouse for extremely large amount of data to reside within one container, as a single source of truth.
Over the last two decades, enterprises around the world started adopting the enterprise data warehouse (EDW). EDW became a core component of business intelligence for most of the small to large enterprises across the world. Data became the king. Whether it’s for specific targeted marketing or medical drug research, complex large scale data analysis provided valuable insights that was otherwise impossible. Oftentimes, large enterprises started creating more than one EDW (often called data marts) for specific business needs.
So why was there a necessity for big data clusters like Hadoop? That’s because everything good usually comes with some sort of side effect(s). EDW solutions involve extremely expensive custom built servers and storage technologies, making it almost unaffordable when data growth within enterprises reaches the sky. We are in the middle of a data Big Bang. Almost no one knows when and how it started but data universe is growing exponentially at an immense speed. To process and store such exorbitantly large sets of data in one or more EDW becomes too expensive to be meaningful and to have an acceptable return on investment. That in turn gave birth to the idea of large scale distributed clustered computing on inexpensive commodity servers. Open source architecture like Hadoop not only came into existence but became synonymous with distributed computing. This is a similar situation where brand becomes the technology – like we say “Googling” instead of “Searching”.
Even with its immense potential, Hadoop as an open source architecture had inherent drawbacks in terms of functionality and required enhancement to be really enterprise ready. That’s where Hortonworks took a lead to make Hadoop enterprise-grade with added features like robust governance, easy manageability and ease of integration with existing enterprise infrastructures.
Even though Hortonworks Hadoop platform and any other such big data platforms are making tremendous progress in data processing and analytics, enterprise data warehouse and many other enterprise systems are not going away anytime soon. To maximize the return on their investments, organizations must find a way to integrate big data with other silos of information, which often times generates tremendous insight and helps organizations with digital innovation.
The technology that is leading the way to such integrations to generate real-time insights is data virtualization. Big data virtualization – a term coined to refer to abstraction mechanism which offers business users real-time insights combining information coming from big data sources such as Hadoop clusters as well as other sources such as traditional BI repositories, web data, cloud data etc. is showing its promise in use cases such as offload of cold data from data warehouse to Hadoop, data scientists sandbox and more importantly in IoA (Internet of Anything) space. I strongly believe that right architecture and deployment of big data virtualization will help keep a lot of promises of big data technology.
The partnership between Hortonworks and Denodo is a big step towards that goal, and this is validated by our network of joint customers and HDP certifications which review, certify and validate technologies against a comprehensive suite of integration test cases, benchmarked for scale under various workloads. Denodo has been tested and validated by Hortonworks having received 3 certification badges – HDP Ready, Yarn Ready and Security Ready.
To learn more about the Denodo Platform and Hortonworks Hadoop tune into the webinar Powering the Future of Data by Piet Loubser, VP or Product and Solutions Marketing, Hortonworks, where he discusses the current EDW architecture and how to enrich this with new technologies such as data virtualization.