With so much valuable data potentially available, it can be frustrating for organizations to discover that they can’t easily work with it because it’s stuck in disconnected silos. Limited data access is a problem when organizations need timely, complete views of all relevant data about customers, supply chains, business performance, public health, and more, to make informed decisions. We need only look at the current COVID-19 pandemic to understand the importance of being able to view and share data across silos.
One of the hottest topics in data architecture right now is data fabric. The notion of a data fabric is becoming important as organizations put more data in multiple cloud-based storage platforms, which can add to existing on-premises data silo problems. TDWI research finds that disconnected data silos are one of the biggest obstacles organizations face as they try to enable faster data insights. In the industry, “data fabric” is variously called an architecture, a framework – and of course, a fabric. With data fabric, adherents aim to provide a more universal and holistic approach to integrating diverse components of physically distributed data environments. Ideally, a data fabric will use services to integrate the necessary components so that data flows more easily and users and applications do not have to use specialized code to access each data silo.
Data virtualization is important to making a data fabric work. The data virtualization layer can provide a single view of logically integrated, multisourced data while supporting federated queries to the sources. Data virtualization layers offer transparent access so that users do not need to know how to access the data or where the data is stored – an important tenet of data fabric. Organizations can use a data virtualization layer to optimize query processing at the sources to take advantage of local processing power, which, given the horizontal scaling power of massively parallel processing (MPP) available on leading cloud platforms, is an important consideration.
Avoiding Data Movement with Data Fabric and Data Virtualization
Data fabric should enable organizations to avoid having to move volumes of data to a central physical location to gain a complete view. TDWI research finds that as organizations put the cloud more in the center of their data architecture, data movement and migration is a major concern. These phases, including extract, transform, and load (ETL) programs, can be slow and costly, impacting organizations’ efforts to quickly gain business value from data. Less than half (45%) of organizations TDWI surveyed for its Q1 2020 Best Practices Report indicate that they are satisfied with the time it takes to load data into cloud platforms, with only five percent “very” satisfied. Organizations can use data virtualization to significantly reduce data movement and ETL across hybrid, multicloud data platforms; instead, data virtualization offers views of data in place.
Using Metadata Knowledge Effectively
In addition, data virtualization makes good use of metadata catalogs, providing users with comprehensive views of the data where it lives as well as its context. Metadata catalogs and higher-level semantic data integration are important to a data fabric, which needs to use knowledge about diverse data effectively to reduce the time it takes to find, view, and access it. In dynamic situations where your organization needs to perform analytics right away on data coming from multiple sources, metadata and semantic data knowledge integrated with a data virtualization layer can help avoid the delays of having to extract and move data to a central repository just to discover whether its quality is good enough and appropriate for analytics. Data fabrics can use data virtualization to support analytics workloads, including those in response to dynamic situations in which data is rapidly changing.
Metadata and higher-level semantic data knowledge are key to governing data, and thus also key to data fabric governance services. You can’t govern data properly if you don’t know where it is and can’t figure out the data lineage of where it came from, how it got there, and how it is being used. A data virtualization layer, combined with metadata resources, can help supply this insight. Organizations can also use the layer to control access to data sources for security and adherence to regulations about data privacy. Data fabrics must enable organizations to maintain security and governance, which is growing more difficult as disconnected data silos spread across multiple cloud platforms.
Putting Together the Logical Data Fabric
Data virtualization, by providing a layered, logical view of data, can be combined with security, network, governance, and other common services in the data fabric to create a logical data fabric for analytics and AI, data visualizations such as dashboards, and other data needs. This combination can enable organizations gain value from data across hybrid, multicloud environments that would otherwise be stuck in silos, but avoid the delays and costs of physically moving all the data across networks to a central location. A logical data fabric can help organizations to knit together disparate data sources in their broad, hybrid universe of data platforms, just as they can make use of data virtualization to create logical data warehouses that expand users’ reach beyond the limitations of traditional enterprise data warehouses.
Data silos will always exist – especially as organizations increase their use of multiple cloud providers’ data platforms while continuing to maintain on-premises data systems. Unfortunately, this makes data integration even more challenging. It’s critical for users and analytics-driven applications to gain integrated views of all relevant data, not just limited subsets. Organizations should evaluate how data virtualization can accelerate progress toward a logical data fabric that can meet current, future, and perhaps most of all, unanticipated data demands.
For more information on logical data fabric, register for Fast Data Strategy Virtual Summit 2020, where I’ll cover this topic in greater detail, and where you will also be able to hear from a variety of other thought leaders and industry experts.
- Getting Above the Silos: The Rise of the Logical Data Fabric - April 2, 2020