Data Virtualization Platform Extensibility

Extensibility as a Key Aspect of a Data Virtualization Platform

All over the world, data is stored in a myriad of different systems and formats, and accessed from a range of applications that gets wider every day. This growth not only means an increase in the number of possibilities and opportunities for data integration, but also that the number of protocols and methods required for accessing all that data keeps growing at a speed the industry struggles to keep pace with.

The basic tool for these kind of scenarios is obviously standards: as long as new data systems adhere to existing standards, existing integrators in the market such as data virtualization platforms should be able to efficiently connect to and work with these new systems.

But unfortunately standards do not always apply: be it because of the emergence of new paradigms in data storage/management that provoke mismatches with existing standards, or because of the scarcity or even lack of these standards in relation to specific areas of data management, sometimes the most effective way to link data virtualization systems with other parts of their enterprise ecosystem is by creating customized integrations or extensions.

Customized integrations/extensions might range in complexity from the configuration of some kind of existing generic connector to the complete development of such connector in a compatible programming language that can be later deployed into the data virtualization platform. Some vendors might even offer sets of pre-made extensions as separate, smaller software packages available to their customers.

But the key aspect to extension in a data virtualization platform is what can be extended, what aspects of the platform’s operation can be customized to meet the user’s needs when no implemented standards apply. These are called the platform’s extension points or interfaces.

Let’s have a look at what could be considered the basic set of extension points for any data virtualization system:

  • Custom Data Sources: in order to access data living in systems with non-standard interfaces, a data virtualization platform should allow its users to programatically create custom connectors that, by adhering to an interface or a series of interfaces specified by the DV platform software (the extension point), connect the operations and data structures used in such interfaces to the APIs or other integration mechanisms offered by the data source. This should allow the DV platform to use these new data sources just as if a supported connector was offered for them out of the box.
  • Custom Data Processing: once data has entered the data virtualization platform, a series of transformations are usually needed in order to combine and reshape the data to build the expected output. But out-of-the-box transformation operations such as joins, selections, formatting functions, etc. might not be enough, and more complex processes might be required. Think of stored procedures in many DBMS, that offer programmatic handling of data directly executed on the data store itself. Custom data processing appears, therefore, as an important extension point in data virtualization platforms for allowing these systems to meet their data handling requirements in many scenarios.
  • Custom Data Access Policies: security is a need at any data management systems, and this includes data virtualization platforms. However, sometimes the rules that specify who has access to what parts of the stored data cannot or should not be specified directly on the DV software itself. Specialized entitlement-management systems may exist in the ecosystem that contain these security rules, which drives to a need for a customizable extension point at the data virtualization platform in order to integrate with these external security mechanisms.
  • Custom Data Output: once the data virtualization system has obtained and integrated the data, and reshaped it to the expected form for output, this data has to be transferred or published through one of the available output interfaces. And again, specific DV scenarios might require custom or non-standard data output flows or formats that the user might need to create if they are not offered out of the box.

Any data virtualization platform offering at least these four extension points will ensure deployment success at the largest number of integration scenarios and enterprise ecosystems. From data input, to data output, to the way data is handled internally, everything should fit just right.

Denodo Labs

Leave a Reply

Your email address will not be published. Required fields are marked *