Using Data Modeling Tools for Top-Down Data Virtualization Design

Reading Time: 3 minutes

Data Modeling enables organizations to define and design their data stores, adapting them to their business requirements and allowing their data structures and policies to be included in the necessary internal communication, design and documentation mechanisms with no or little ambiguity.

Specific data modeling tools can be used to create the different layers of logical and physical models that represent the diverse data stores in an organization, and these same tools can help admins bootstrap the development of the data schemas, maintain them, keep control of changes and perform other types of synchronizations of metadata to and from the data stores. From a higher-level point of view, data modeling conforms a first step in the way to integrated, company-wide data governance strategies.

Modeling Data Virtualization

When confronted with data virtualization scenarios, data modeling tools face a series of complexity challenges that limit the scope of their applicability. These challenges derive from the fact that virtual databases do not merely conform stores of data at rest —be those in the form of relational tables or other types of structures—, but instead define a series of transformations that bring data from its original states in the source data stores towards their final shapes and formats in the interfaces —or data contracts— that will be offered to the data virtualization platform clients.

Typical data modeling tools are not a good fit for designing virtual databases, as the data retrieval and combination features that are key components of data virtualization systems are generally unknown to data modeling tools. So that will be left for the more specialized administration tools provided by the data virtualization platforms themselves.

But as the data virtualization platforms’ aim will be to offer their combined/reshaped data through a series of data interfaces and contracts, and these will actually be what any client pieces of software will see from these DV systems, we should be able to use general data modeling techniques for adequately defining those data interfaces/contracts in the form of data models, so that they can be adequately communicated, documented, and included in any higher-level data-related processes at the company (such as Data Governance).

And if data modeling tools can define the data contracts to be offered by data virtualization platforms, those data contracts could also be synchronized towards the DV platform in the form of virtual interface views, a step that would effectively allow top-down design for data virtualization.

Top-Down Data Virtualization design in practice

So how can we model those data contracts at the data modeling tools in a way that they can be later synchronized into interfaces at the DV side? In its simplest approach, the following mappings could be adopted:

Entities/Tables: Logical entities and their attributes (or physical tables and their columns) modeled at the data modeling tools would be synchronized into the DV platform as interface views. This way, even if data would not be really stored at the defined structures —because of being virtual—, their definition could be equally used for communicating these data structures to other systems that might depend on them. It would be a later task (at the DV side) to actually design the required data combinations and operations to fill those interface views with data coming from the real data sources, but this top-down approach would have already served its aim of allowing the definition of the data contracts beforehand.
Associations/Relationships: Associations (or relationships) modeled at the data modeling tools would be synchronized into the DV platform as logical associations and/or —depending on the specifics of the relationships— referential constraints between interface views. These structures at the DV side would easily allow any client software to easily determine relationships among the different parts of the data contract by scanning the association metadata offered by the virtual database.

So in summary, by means of the same typical Data Modeling tools that have been used for a long time to design relational databases and other common data stores, we can design the interfaces/contracts offered by data virtualization systems to other parts of the corporate ecosystem and also benefit from an easy bootstrapping of the virtual database development process.

Author
Recent Posts

Denodo Labs

DenodoLabs is Denodo’s advanced team for software innovation, continuously listening to the state of the art in Data Virtualization and producing the DenodoConnect component ecosystem, which enables Denodo customers to extract maximum profit from their Denodo installations in key areas such as Big Data integration, SaaS connectivity, testing, management and more.

2 Comments


Jay Devlin

January 29, 2018 at 1:30 pm

This has been an aspiration goal for quite some time, but it has been hard to realise due to a wealth of data modelling standards. In addition, those standards did not support the detailed explanation of query paths and optimisations which (as you stated) is a key value of DV.
So I would like to understand if Denodo has investigated the opportunity to promote a common modelling approach that would support necessary DV extensions.

Denodo Labs

February 5, 2018 at 5:08 pm

As you mention, there are many obstacles to the definition of a common modeling approach for Data Virtualization systems, not only due to the large amount of possible operators and transformations that can be of application to this kind of data environment, but also due to the large differences existing among the diverse techniques made available by different software vendors.

There is no clear, industry-wide technical definition for a common corpus of Data Virtualization operations or algebra, so data transformations at this level are mostly custom-tailored by vendors to the necessities and capabilities of each of their available platforms, which anyway also show subtle (or not so subtle) differences in alignment and consequently may also offer differing toolsets.

Using Data Modeling Tools for Top-Down Data Virtualization Design

Related Posts

About me

2 Comments

Leave a Reply Cancel reply