Bring any data to any data consumer, simply and easily: that’s the goal of data virtualization. Yet contrary to what may first come to mind, data consumers are more than simply BI, analytics, or data science applications. Just about every application consumes data. A mobile application that tracks order status, a web page that enables customers to consult their bank transaction history, even a CRM that integrates fulfillment data from an ERP: these are all data consumers. They typically use APIs to obtain data. Today’s applications mostly use REST APIs, but legacy systems might employ SOAP, or a vendor-specific protocol like SAP BAPI.
These application data consumers have their own sets of data integration challenges.
If you’re currently building a data-driven application, you may find yourself asking a few general questions, such as these:
- Can I develop back-end APIs fast enough to keep up with my front-end application development? Or to keep up with my evolving business needs?
- How many specialized developers do I need on my team?
Or even some very specific ones:
- How can I minimize costs when integrating data from a SaaS cloud application with a complex, pay-as-you-go API?
- Can I offload historical data to a cheaper storage solution without impacting my customer-facing web site?
- How do I continue to access data from my CRM while I’m migrating my ERP?
- My development team would like to try out a new API technology such as GraphQL; do I really want to also invest in back-end development, before I’ve seen the potential benefits?
Data virtualization can provide helpful answers to these and similar questions. Not only does it enable you to easily consume and transform data from existing APIs, but it also helps you to quickly build and deliver new ones – all without requiring you to write a single line of code, so you can concentrate your development efforts on building core business value or a stunning new user interface.
Data Access: The First Development Hurdle
The crucial first step in any data integration project is connecting the data sources and retrieving and transforming the data into a usable format. If you’re building an API, you may find yourself writing this data access logic directly into your back-end code. This means finding the right data access library for your development environment, figuring out the subtleties of connecting and querying the data source, and mastering the target syntax. This can be well outside the skill set or comfort zone for many developers. Even if they are hotshots at SQL query optimization, any connection code they write remains part of the application code and can’t be easily modified later on.
This problem gets more complex with each new data source. Worst of all, the code that combines data from the different sources is frequently difficult to optimize, and it runs the risk of becoming an inextricable part of the API code base.
Building a New API with Data Virtualization is as easy as Connect, Combine, Consume
With Denodo data virtualization, the work of connecting heterogeneous data sources and combining their results is handled by the platform. To set up a connection to a new data source, simply select from a long list of out-of-the-box connectors: The Denodo Platform connects to all major relational databases (Oracle, SQL Server, DB2, PostgreSQL, SAP HANA…), cloud data warehouses (Snowflake, Redshift…), OLAP databases (Essbase, MS SSAS, SAP BW/BW4…), Hadoop systems, NoSQL databases such as MongoDB and Neo4J, but also cloud applications such as Salesforce, SAP, and other ERP systems, any existing SOAP or REST web services, and also plain old Excel and text files. At runtime, the Denodo Platform not only finds the data it needs, but it automatically optimizes the requests sent to each data source according to their individual capacities and capabilities.
Once connected, the next step is to combine data from these different sources. Data virtualization makes this step easy, too, since the data building blocks are all compatible, whichever source they came from initially: Data is mapped to “base views,” which can be combined in a low- or no-code fashion in a visual modeling tool. This is where you can select customer data from a data warehouse and join it with the latest info from a CRM, for example. Or you can select a customer’s recent bank transaction data from Oracle Exadata and combine it with historical transactions that have been offloaded to Hadoop. While combining data, you can use the Denodo Platform’s cache capabilities to strategically store the data in a different source, which can be useful if you want to reduce the load on an operational system like an ERP or limit the pay-as-you-go calls to an external API.
The last step is to prepare the data for your data consumers. When building an API, this means giving your data a business- and application-friendly structure: in other words, the structure that maps to business concepts. Let’s say you’re building a mobile application that lets customers track recent orders. You may want to provide an API that takes a customer ID as input and returns a list of the customer’s orders, each order with a nested list of items. You can model this hierarchical structure as a virtual view in the visual modeling tool – once again, with no coding required. Then, in a few clicks, the view can be published as a web service. We’ll take a look at that next.
Do You Speak REST, OData, GraphQL, or GeoJSON?
REST APIs have been the de facto standard for at least a decade, and it’s easy to understand why, since they are lightweight, flexible, and easily deployed over HTTP. Part of the appeal of REST is that it isn’t a fixed protocol; developers can choose a REST data delivery “flavor” best-adapted to their needs. However, this flexibility doesn’t mean that there isn’t ever a choice to be made. Do you build a straightforward REST API? Or do you choose the time-tested, data-oriented oData protocol? Or do you take a chance on a new technology like GraphQL, with its promise of lightweight, targeted queries? What about geospatial data? Is it worth the development effort necessary to provide a GeoJSON API?
With data virtualization, you often don’t have to make these difficult choices. The Denodo Platform can simultaneously deliver data via all of these protocols, leaving front-end developers free to select what best works for their applications. As always, data virtualization separates the consumption method from the conceptual model, so the same business view can be available in several different ways, without any additional development.
When Your Data is a Moving Target
You know by now that APIs are the best way to make your data available in different formats to myriads of data consumers, and hopefully I’ve convinced you that data virtualization can make it significantly easier to put new APIs in place. But what if your API is a victim of its success? All the applications that rely on it would be dependencies to consider in your next data migration project. Luckily, again, with data virtualization in place, data migration can be much less costly and time-consuming. That’s because it is easy to switch the virtual view of an old data source to a new one with minimal impact to data consumers: just disconnect the old data connection and replace it with the new one.
In fact, some development teams use this inherent flexibility of data virtualization from the very beginning, building out their APIs on a simple data source – a PostgreSQL database or even a CSV file – so they can first concentrate on front-end development. The final back-end data source can be put into place at a later date. This approach is practical when development teams work at different speeds.
You may not want to think about future data migration projects when you’re putting your first data access API into place – you may not want to think beyond your application MVP launch date – but the beauty of data virtualization is that you don’t have to.