Shifting Towards a Logical Data Warehouse, in 4 Steps, Without Having to Close Shop
The logical data warehouse is the data architecture of the future — fast, flexible and ideal to support self-service BI. In the past year, I have published quite a lot regarding the architectural advantages of the logical data warehouse (read my previous blog posts here and here).
In an age where the role of information continues to become more business-critical and where data analysis plays a differentiating role in primary business processes, the demand for an architecture that can supply information faster with greater flexibility has drastically increased. Not really a surprise when you consider the fact that a lack of correct information can bring business processes to a complete halt. Information has thus become the fuel of the 21st century and demands a new engine!
But what about the investments that we have already made in traditional data warehouses, data marts, analytics, reports, and dashboards? Has all that money been wasted? Must we build everything from scratch? No, definitely not! I previously recommended opting for the hybrid approach with data virtualization. I would like to explain how one can migrate to such a hybrid architecture in a controllable fashion without having to close the information system. A well-organized, step-by-step, ‘agile‘ migration retains as much of the existing investments as possible and minimizes risks. The architecture of the logical data warehouse is perfect for this!
One of the ways in which the logical data warehouse can be implemented, is by means of the data virtualization concept. Data virtualization platforms typically offer functionalities ranging from the ability to decouple the source system to the publication of information for BI tools and all possible information services. The basis for our migration approach is ‘decoupling’ by means of virtual data marts, but before we go into detail, I will provide you with an overview of the logical data warehouse architecture based on data virtualization:
The connect layer ensures decoupling of the data sources. This layer takes care of the access to the required data in the connected source systems. The combine layer generates reusable, integrated elements of the data and finally, the publish layer serves as the uniform information model (containing all business logic) used by applications and services. This uniform information in the publish layer can be made available for use in various ways: virtual database, views, web services, QVDs, etc.
During the migration to a logical data warehouse, the information system must naturally remain open. The supply of information should not be jeopardized under any circumstance, however, we want to create the possibility of adding new data and information to flow outside of the traditional, customary architecture of the classic data warehouse. An additional reason to implement the migration in small, controlled steps, is so that the existing information flow is not compromised. Migrating to a hybrid architecture thus allows the system to remain open and keeps clients satisfied.
Step 1: Virtual Data Marts
The first step towards migration is virtualizing existing information sources, typically beginning with data marts. By incorporating tables in the publish layer that refer directly to the physical data mart tables, the sources can be easily decoupled from their data locations. These data marts thus become “virtual data marts.”
Step 2: Migrating Information Services
We now adapt our existing information services such as reports and dashboards so data can be retrieved from the virtual data marts. In most cases, this merely involves converting a connection from the physical database to the virtual database. This means our information services are connected to the initial version of our logical data warehouse! By decoupling the logical and physical storage of data, we have accomplished one of the main characteristics of the logical data warehouse.
The logical data warehouse primarily revolves around offering the “information product” to the client in the smartest, or most efficient way possible without the client being confronted by the complexity of the underlying architecture. Compare it to a restaurant; depending on what the client wants to eat, the client is served the meal quickly as desired, without having to peek into the kitchen. Vegetarian? Gluten allergy? No milk products? No problem, we serve everything from the same kitchen! Suddenly the need takes precedence. In this moment, the form, the flavor, and the presentation are decisive, not the process!
Step 3: Adding New Sources
After these first two migration steps, we have already completed a large part of the overarching migration process: logical and source systems have been decoupled. We now have “input freedom” and “output freedom” — we can add data sources without following the fixed route of the classic data warehouse (ETL, replication, etc.) and can publish the information easily in a number of ways. When we want to add new data, the new data source is disclosed in the connect layer. The new data is then made available to users via the publish layer. The result is a logical data warehouse which contains the virtual data mart from step 1 and “parallel” to that is the new information, without the two being integrated. We are now able to connect new users — REST APIs, SOAP web services, and BI tools — and can assign rights to specific information from the logical data warehouse through Denodo.
Step 4: Integration
The foundation of the logical data warehouse is now in place. It is very likely that we will feel the need to integrate the new information with the data from the classic data warehouse. The next step will then be to add the parts from the data warehouse and the new information to the combine layer. Here, we establish how the data will be linked and what the resulting data set will look like. The new entities from the combine layer can now be added to the existing data marts in the publish layer. The result is a virtual data mart that unlocks data from both the classic data warehouse and the new data sources.
Easy to Oversee
Beautiful isn’t it? The transformation from a physical to a logical (or hybrid) data warehouse is that simple using a data virtualization platform like Denodo. First, we connected the data marts to virtual data marts by virtualizing them one-by-one. We then converted the reports, dashboards, and other BI products to these virtual data marts by altering the connection. Finally, we started working on the resulting logical data warehouse by adding sources directly and the integrating them with data from the classic data warehouse. All of this without altering the existing data warehouse. Easy to oversee, don’t you think?
We can therefore migrate to a logical data warehouse in controllable and well-organized steps without having to close our respective information systems. Depending on the costs-benefits analysis, we can convert parts of the existing data warehouse architecture into a virtual strategy, though this is certainly not required in all cases. For each new data source, we assess whether a physical or virtual strategy is best. The best of both worlds! We determine the logical data warehouse architecture location in four steps, depending on the intended goal for the specific data.
1) What are the information needs?
2) Where is the required data stored and what are the (limitations) possibilities?
3) What are the needs and requirements of the supply of information?
4) Is the information and underlying data consistent? (more on this subject at a later stage…)
‘One size’ Does Not Always Fit All
The great thing about the logical data warehouse is that the architecture supports a physical and a virtual strategy. It is not a one-size-fits-all architecture that processes and represents every need for information in the same way. Essentially, I have explained that converting your classic data warehouse to a logical data warehouse is not really that big a step (a concern that I encounter quite frequently with companies that are considering data virtualization). I hope that I have succeeded in alleviating this concern and that I will soon be able to welcome you to the virtual world of the logical data warehouse!
This blog was penned by Jonathan Wisgerhof, Senior Architect, Kadenza
Latest posts by Kadenza (see all)
- Get Ready for the General Data Protection Regulation (GDPR), with Data Virtualization - May 24, 2018
- Data Virtualization is a Revenue Generator - September 20, 2017
- Data Delivery: Four Challenges, One Solution - February 1, 2017