Data Fabrics Need to Coexist with Data Warehouses and Other Database-Centric Technologies
Since the dawn of IT, business was in need of one integrated, consistent view of the data coming in from multiple applications, and for a long time, data warehouses have been the preferred choice to solve this problem. Recently, data lakes and data hubs have also been introduced to solve this problem, and the latest alternative is data fabric. A thought-provoking question is whether data fabric will make the other solutions, the data warehouse and data lake, obsolete? Will it replace them?
Most transactional databases were designed to support one application — one for human resources, one for customer relationship management, one for sales, and one for transport. Then decision makers needed to integrate data from multiple applications in order to gain a complete picture of what they were analyzing. Operational users also needed integrated data access to gain, for example, a 360-degree view of a customer, order, or patient.
The way data warehouses, data lakes, and data hubs tackle this challenge is by copying data from multiple systems to one centralized database and then making that database accessible to data consumers.
The Benefits of Data Fabric
Data fabric works differently, as it is not a database-centric architecture. With data fabric, applications are wrapped in a service layer, which presents the needed integrated, consistent view of the data. It hides all the different technologies, languages, and APIs used by the applications and gives data consumers easy access to the data. As Gartner puts it, data fabric offers frictionless access to the data.
An important functional difference between data fabric on one hand and database-centric solutions on the other is that the former, besides offering services to query data, also offer support services for inserting, updating, and deleting data. Also, data fabric can provide real-time access to data, because its services access applications directly rather than a database containing copied data.
Data lakes are no substitute for data warehouses, and data hubs are no substitute for data lakes. They all have their use own purposes, although some may overlap. But will data fabric replace all of those database-centric solutions? The answer is simple: No.
Data Fabric Requirements
To succeed, data fabric must support a wide range of data consumers, including websites, portals, mobile apps, dashboards, self-service BI, and data science sources. To do this, its services must deal with several challenges, such as:
- Logic must be developed to deal with inconsistent, incorrect, and missing data. This logic is very similar to the logic normally found in extract, transform, and load (ETL) programs.
- If applications do not keep track of historical data, it must to be stored within the service layer.
- Applications that can barely handle their existing workload will not be able to process the extra workload generated by data fabric.
- Data must be anonymized and/or pseudonymized to comply with data privacy regulations.
- All consumers of a data fabric can be divided into object-oriented and set-oriented data consumers. The first group manipulates or queries only one business object, record, or document, such as one order, one customer, and one patient. The second group processes sets of objects, such as all the orders of a customer or the net sales aggregated per month and per region. Supporting both groups, especially the second, is not trivial.
To implement services that tackle the above challenges, it is very likely that data needs to be stored within the service layer. For example, if an application does not keep track of history, and if some consumers of a data fabric need that history for analytical purposes, it needs to be stored, and that is where a database comes in. Also, to support set-oriented data consumers efficiently, data must be stored in a database that can execute those queries quickly. Stored data within the service layer is needed to solve some of the challenges.
A data fabric is never just a layer of services consisting of programming code. In real life, a data fabric encompasses a data warehouse, a data lake, or perhaps both, or a data fabric might access a data warehouse as if it is one of the applications. In other words, data fabric coexists with those database-centric solutions. In building data fabric services, developers cannot help but enable this co-existence.
- Data Fabrics Need to Coexist with Data Warehouses and Other Database-Centric Technologies - March 29, 2021
- Streamlining External Data Access to Enrich Analytics - December 23, 2020
- Benefits of Data Virtualization to Data Scientists - October 14, 2020