Improving BI Delivery Using Data Virtualization
Global property investors, asset managers and loan companies collect data about their commercial real-estate portfolios as part of their day to day operations. This data will include details of properties, tenants, income, expenditure, rent escalation terms, costs and revenue, etc. This data will be provided by various 3rd parties including property managers and their property management systems, appraisers, lettings agents and other data providers. Without an efficient way to bring this data together into a single consistent view, critical business decisions may be made without vital, accurate information.
For Voyanta’s customers, the process of providing and validating the data into the Voyanta system is handled directly by the providers of the data, rather than emailed back and forth. In all cases, the data is consistently formatted according to clearly defined templates and passed through an automated workflow to validate data against system and customer specified business rules. This process ensures that all data is correct and consistent before it is loaded into the data repository, which is a separate MySQL data warehouse for each customer. The data is then available for visualization or direct export.
Customers can analyze status and performance of their properties using Voyanta’s suite of online reports and dashboards, or create their own Excel spreadsheets with a direct data feed from the Voyanta database using Voyanta’s powerful Excel plugin. Data can also be exported to other systems in a range of formats or accessed via APIs.
Our customers demand that their data is available for reporting and checking very shortly after it has been uploaded to the system. Providing that timely analysis is a challenge as our data volume typically doubles every 6 months or so. We need to keep up with the data growth and stay responsive to meet our business demands. For that, we need flexible and powerful technology to support our complex BI needs. Our reporting requirements are very specific and challenging and allow our customers to perform complex analyses with precision.
Voyanta’s reports and dashboards have been developed in a leading BI system that comprises a multi-dimensional data model and visualization technology. The source data must go through complex transformations before we can deliver the data into numerous complex and tightly specified reports and dashboards, across which, we provide fine-grained security, limiting user access to only the information they are allowed to see, down to an individual object and information type level.
The time taken by this complex processing must be minimized to ensure that information is available to users soon after data load. To execute such transformations in an acceptable time, we have invested deeply in technology to optimize the data refresh and reload, auto scaling infrastructure, and queues.
Even with this investment, our BI architecture lacked agility. Keeping transformation execution times down was a constant battle, and the all-in-one data model and visualization of the original solution was not amenable to reuse of the data models, and did not give the option to use alternative visualization technology or export of the data. We could neither pull in data from outside the model nor repurpose the data model to other uses such as APIs. All of this resulted in the need to build and manage similar transformations in additional technologies.
The Solution: Data Transformation Layer
To solve the problem, we needed a new architecture. We built a 2-tier architecture that separates transformation from visualization. This would allow us to plug-and-play different data sources and visualization technologies.
For transformation, one approach would be to use an ETL solution, but that would inevitably introduce delay between data in and data availability because ETL requires batch builds and the necessity to create a transformed copy of the data. Not only does this introduce delay for live customers, but also during the development cycle, our developers have to wait for the model to build each iteration as they make changes.
To mitigate the delays of ETL, we chose data virtualization, which I call “ETL on demand.” Data virtualization creates virtual databases that handle the data transformations on-the-fly. Since the data is pulled directly from the source databases in real-time, there is no delay in reload or refresh; potentially the data is available instantly. With data virtualization, you create the transformations, and then you typically get your answers immediately. Development is quicker and developers do not lose the flow.
We selected Denodo for data virtualization after a proof of concept (POC). It is a flexible product that is able to source the data from databases or RESTful APIs. It has enabled us to augment SQL transforms with Java-based stored procedures, and present the data in a variety of formats including JDBC and RESTful APIs. The system is very performant, using flexible caching and query optimization approaches. With the Denodo Platform, caching allows results to be stored and accessed quickly.
Denodo enabled us to provide solutions at a faster time-to-value while enabling code reuse, consistency, and compliance. The development cycle is significantly improved since there is no waiting periods for reloads. Its modular approach provides better predictability of development, and flexibility in visualization technology.