Big Data Requires the Context of Small Data

Big Data Requires the Context of Small Data

Big data has entered many companies’ data centers and is already transforming whole business models. The grade of adoption varies from initial test and educational projects towards fully-fledged big data landscapes; on-premise or leveraging the flexibility and power of cloud deployments. Hybrid models are also emerging – keeping corporate data in the local data center and moving non-critical data into the cloud. With the advent of IoT, large amounts of non-mission-critical data are gathered and stored natively in the cloud.

There is a paradox is in the data itself: data is changing shape and definition so quickly, that for most of it there is only limited value in permanent storage. Leveraging data virtualization helps to retain the best of both worlds: Traditional relational databases and the new world of NoSQL and Hadoop – with native support of complex and document-oriented data structures like JSON. Thereby, data virtualization provides unified access and sets data analysts free to discover, search data and metadata with self-service capabilities and direct browsing of data sets in a secure and governed manner.

Whatever use case you select, you cannot deliver business value if you implement your big data project in a silo. Tying in your big data with your data warehouse, CRM, and ERP applications is paramount to democratizing enterprise data and empowering business users with holistic answers. Big data mostly resides in NoSQL storages or Hadoop for cost and performance reasons. Relational data access is key for traditional business intelligence; but insufficient for contemporary data science and slower or more expensive to deliver than alternative solutions. Obviously, there is no economically feasible solution to store all this big data in a (traditional) data warehouse. One approach is moving all this “small data” into a single centralized big data environment, but this will require resource-intensive synchronization. Alternatively, data virtualization bridges the gap.

A lot of investment has been made into enterprise data warehouses and they are most likely to exist for many more years – historical data, regulatory compliance and auditability are core capabilities. The key is combining a set of approaches into a smart strategy. Such approaches are logical data warehouses, data warehouse extensions, or (governed) data lakes etc.

You can learn more about such patterns and reference architectures from blog posts such as logical data warehouse common patterns and logical architectures for big data analytics. Some of our customers use data virtualization to transform their whole IT landscape and gain substantial benefits by cutting down the point-to-point connections and data replication by 50%, lowering the effort for maintenance and new infrastructure investment by up to 80%. This is true agility!

If you are interested in learning about the power of the Denodo Platform and Cloudera Enterprise, watch our webinar on-demand. Come and see our live demo showing how data virtualization joins forces with Hadoop.

Please note that the webinar will be presented in German by experts from Cloudera and Denodo.

Christian Kurze

Christian holds a PhD in Information Systems in the field of Data Warehouse Automation. He has been involved in working with business intelligence metadata and data integration projects. His practical experience originates from projects in various industries as well as from the product management of an Active Metadata Platform.
Christian Kurze

Latest posts by Christian Kurze (see all)

Leave a Reply

Your email address will not be published. Required fields are marked *