It’s funny – only a few weeks ago, I was at Informatica writing about Master Data Management (MDM) and how businesses gained a complete view of their customers by combining the single view of customers, sitting within the MDM system, with the respective transactional and interaction information using data virtualization; now, that I’m at Denodo, I’m writing about the flip—customers using data virtualization to combine customer, product, and other master data with the respective transactions from business applications as well as interactions from social networks and big data systems. So, which one came first – Master Data Management or data virtualization?
Actually, it doesn’t matter; because MDM and data virtualization are highly synergistic. Data within any organization can be broadly classified into two categories – reference and transactional data. (Now, with the advent of social media, you can add a third one – interaction data!) Reference, or master, data is data about customers, products, and so on that is used to refer, again and again, to a business entity. This data is usually captured in multiple systems like CRM and ERP, and hence is duplicated. On the other hand, transactional data is tied to a reference data, like the number of products purchased by a customer, or the price paid, etc. This data also sits in different systems, but it’s not duplicated. While an MDM system reconciles the duplicates and creates a single view of the customer, it does not house the transaction data (or the interaction data for that matter) to provide a complete view of the customer – with the associated transactions and interactions. That’s where data virtualization comes into play.
Data virtualization combines the master data with their transactions and interactions. Let me give you a commercial banking example. If you are a business and you want to borrow money, you approach a lender. Besides the core information they want to know about you, like where you are located, what you sell, etc., there’s also some less important information they want to know – like can you pay them back! Seriously, banks have sophisticated systems to calculate the risk of lending money to you. And, they will keep reassessing the risk, even after they loan you the money, based on your business performance. After all, they don’t want to loan to another Lehman Bros! To enable these risk assessments, banks need to combine the risk information with client information—that is you. Client information is master data and comes from the MDM system, while the risk information comes from risk systems. Data virtualization is the technology that can be used to combine these two related information and present it in real-time to risk analysts.
Further, to “Data” Virtualization and Master “Data” Management add another “Data” – Data Governance. I used to say that data governance is prerequisite for Master Data Management success. Now, I say the same for data virtualization as well. Both technologies enforce governance as a discipline within organizations to create, maintain, and access data and avoid the mess that called for the use of these solutions in the first place.
So, does it matter which one came first?
- The Data Lakehouse Myth - February 22, 2023
- Data Virtualization and Data Science - July 1, 2021
- Key Insights from Three Cloud Experts Roundtables: Accelerate Hybrid Cloud Journey, Harness Cloud Best Practices, and Simplify Data Management - September 30, 2020
I am not the expert but…I think it makes sense to separate what you are calling reference data into Master Data and Reference Data. Master data are the who, what and where’s of our data that give context to the transaction. Reference data are lookups and codes tables that are broadly relevant across orgs such City, Postal Code, Country (location ref data), Gender, Marital Status, Colour…the list goes on. Reference data is often found in Master Data. How you manage each type is different.
Your description of the “reference data” is correct Tim, but it highlights the fact how the MDM industry assigns different definitions to “reference data.” There’s the definition that you mention, but for the folks in financial services, “reference data” is their “master data.” So, I’ve learnt to use “reference data” as “look up data” like you mention in industries other than financial services. Thanks for your comment.
Very good one!