Logical Data Warehouse: Six Common Patterns
Logical Data Warehouse is a major topic these days, so when Denodo hosted an event focused on this, I had to attend. The event consisted of various presentations, including a general introduction to a logical data warehouse and demos. However, my favorite part was hearing about the different use cases for this technology, so below, I will summarize the common patterns for a logical data warehouse.
1 – Virtual Data Marts
These represent an easy approach for business users to consume data without the need to think about concepts like star schema and foreign keys, and see the data sets from the perspective of their department. A Virtual Data Mart will integrate multiple sources and create a business friendly data model available to end users or other consuming applications, like reporting tools.
2 – Data Warehouse + Master Data Management
Another common pattern for a logical data warehouse is blending data from your data warehouse and MDM (master data management). This may occur because you have separate teams using the different systems exclusively, and you want to keep it this way. In this case, a logical data warehouse offers a virtual data layer that collects data from each environment – data warehouse and MDM – and exposes that combined view of the information to enrich the raw data.
3 – Data Warehouse + Cloud Computing
Similar concept as above but a coming from a different angle: given the increase in the adoption of cloud applications, a new scenario for a logical data warehouse is to blend information from the data warehouse with data from different cloud environments, like Salesforce.com. We can also encounter cases where the data warehouse itself has been moved to the cloud (using Spark, Amazon Redshift, etc.). Aspects like latency and the variety of sources involved makes this scenario own its own section.
4 – Data Warehouse + Data Warehouse (Data Warehouse Integration)
Very often large corporations have more than one data warehouse. Different teams may manage their own warehouse, etc. These transactions often involve independent, complex and incompatible systems that are difficult to consolidate. A logical data warehouse can facilitate this process by blending the data from both environments.
The same applies for migration projects. During a transition from a traditional data warehouse to a new cloud based platform (Redshift or Spark), you will need to maintain two data warehouses alive for a certain period of time. In this scenario, you can use a logical data warehouse to access two or more data warehouses from a single virtual data layer and ensure continuity in your business applications.
5 – Data Warehouse Offloading
The recent appearance of Hadoop in the data landscape has created new scenarios not initially anticipated. For example, many companies are using Hadoop as a cheap way to store high volumes of data. Recent data may stay in a traditional data warehouse (to ensure maximum performance) whereas a Hadoop cluster is used for historical data (when performance is not a priority). In this case, a logical data warehouse allows you to blend data from the two different systems, so you can run queries transparently without disturbing your existing business processes.
6 – Data Warehouse Extension
A similar concept to the above is the data warehouse extension with the difference being the type of data that is stored. Your traditional data warehouse (Vertica, Netezza, etc.) stores the most common used information, and the external, cheaper environment, such as Hadoop, stores the rest of the information. So, instead of splitting the data by date, here the traditional data warehouse keeps the simplified version of your data model, and Hadoop holds the rest of the attributes that you rarely use.
The last two common patterns for a logical data warehouse create a sort of virtual partition as the information is divided (by date, attributes or data model) between the two systems: traditional data warehouse and Hadoop systems. The intelligence provided by the logical data warehouse helps you implement the logic that keep your queries running and knows which data is in which repository.
Hearing about these common patterns, has really clarified for me the uses of this technology today and how these solutions are being implemented. If you are interested in learning more about it, watch the full session here: Common Patterns for a Logical Data Warehouse.
From your experience, are there any other common patterns for a logical data warehouse that I did not mention here?