During a recent house move I discovered an old notebook with metrics from when I was in the role of a Data Warehouse Project Manager and used to estimate data delivery projects. For the delivery a single data mart with supporting Reports and Dashboards, with 6-8 dimensions and a handful of measures or facts, the delivery cost was approximately $200K AUD per data initiative and it needed a delivery team of 8-10 resources over a duration of roughly 12 weeks.
In this example we used more traditional methods of data delivery. We used an ETL approach for all data movement (in this case DataStage) and an agile, iterative methodology for development and deployment. The data would move from the source system to a staging area (making it easy to reconcile the data being captured), then the data would move to an ODS or Operational Data Store (to service the business from an operational reporting perspective), then the data was modelled and moved to the EDW or Enterprise Data Warehouse (for historical storage and time-series analysis), then the data was moved into subject matter specific data models (quite often star-schema designs) and then finally multi-dimensional cubes (used for more in-depth analysis and reporting).
With each movement and modelling of data for each layer of the data architecture, required a data pipeline to be built and maintained. With every movement of data, there was a potential point of failure. With every movement of data, you reduced the ability to deliver the data in real-time. With every movement of data, you have costs associated with replication of data and increased storage requirements.
While some of these more traditional methods of data delivery are still viable in some cases, I wonder why we are not doing things differently. There is data everywhere and we continue to replicate it – even now to the cloud. It really is time to be smarter about how we manage data and how it is exposed to the data consumers. We need to have a more modern approach to engineering data. We need a smarter way to integrate data and we also need to be smarter about how we service the business from a data perspective and stop having the IT department as a bottleneck.
The good news is that there is an alternate approach and that is to include a logical layer in your data architecture so that you can deliver faster. You can hide the messiness or complexities of the underlying data ecosystem. Why replicate data when you can access the data virtually instead?
Data Virtualization provides a logical data fabric layer across your enterprise data assets, providing easy governance and control. Data Virtualization does not move the data from one place to another; it creates select, real-time views of the source data, on an as-needed basis, while leaving the source data exactly where it is, minimizing data movement and maximizing query performance.
Data Virtualization allows data requirements to be modelled in a logical fashion, therefore reducing the amount of effort required for modelling of different physical data models and the building data ingestion processes to populate these physical structures. Denodo can also work alongside and compliment traditional Data Warehouses and more modern Data Lake solutions.
Data Virtualization also gives Data Professionals visibility and flexibility in discovering and re-purposing data assets across the enterprise. It provides agility and enables more re-use of data which leads to more trust in data. Data virtualization is also the key to being adaptive and resilient, particularly in pandemic times, when we need to be more insights driven.
With Data Virtualization, organizations can take a more rapid approach to data delivery. I recently heard Forrester quote that you can gain a 65% improvement in delivery times over ETL with Denodo.
The moral of this story is, I really wish I had Data Virtualization available as a technology layer within the data landscape when I was a Data Warehouse Project Manager. We could have definitely delivered more with less effort and at less cost. Just based on Forrester’s prediction for ETL savings, in my example of $200K cost per data initiative, using Denodo would help to save at least $40K in ETL development per data initiative and save several weeks in time. So, it really does make a lot of sense to logicalize first and physicalize last.