Popping Bubbles: communicating between silos using a CDM

Does this discussion sound familiar?
“…”
“But an employee can also be a customer!”
“True, so we would need something like ‘Person’ to capture the commonalities.”
“OK, so email should be part of person, but it has to be optional for customer and required for employee. How do we do that?”
“…”
And before you know it, a phone number is no longer a phone number, but ‘Customer/Person/ContactInformation/PhoneNumber[type=mobile]‘

What’s the use of a CDM?

Why would anybody want this? Well, let’s start with the definition of CDM: Common Data Model (or sometimes Canonical, but that’s a difficult word, so I’ll leave that until later).
In other words, it models the data that is common in your company, industry or process. Each term that is used in your company should mean the same thing to everyone that uses the term. Something like a dictionary for your company.
Now that would be useful indeed! It would also make my work more or less obsolete, by the way. The number of meetings I’ve been in where two people think they are talking about the same thing, simply because they use the same word… or the other way around: people shouting in each other’s face thinking they disagree because they use another word for the same thing.

Is a kiss is a kiss is a kiss?

How can this be? Mostly these discussions happen between people who’ve been working for the company for 10 or 20 years. You’d expect them to know what the definition is of a Thingamabob! Frank designed them for 13 years now and Bob repaired them for over 20.
Aha!
They do different things with them! Their use, and therefore their view, of a Thingamabob (TMB) is different. Frank draws them on paper and describes what makes this one better than the previous eight! Bob on the other hand knows where each one is located, knows the spare parts and tools to repair each version.
But wait, Frank also knows where they are located. TMB1 is used in step two of the process and TMB13 is used step 46. Yes, and Bob knows that TMB1 is the yellow box over here and TMB13 the blue one over there. What ‘location’ is for Frank is not ‘location’ for Bob. Good luck explaining this to 33 years of working experience.
Clearly, Frank and Bob will not come to an agreement about this without help. Somehow they ended up in a meeting together trying to capture all attributes of a TMB and now they’re stuck. Why are they stuck? Because they have been working within the confines of their own silo and everybody in their silo agreed: this is all there is to know about a TMB. And now somebody decided that “sharing Information makes everything better” and would “improve the lives of everyone involved”. On paper. In reality this means that Frank and Bob have to change their beliefs about Thingamabobs. But people in general, and Frank and Bob especially, don’t want to change their beliefs. They don’t like to look outside their carefully build silo to find that someone else has a completely different view of the world or even of a part of it. It means they themselves might have the wrong view. The horror!
What people don’t realize is that other people always have a different view and that nobody is wrong. (except you of course if you disagree with me )

What you see depends on where you stand – D.L. Morrese

How to create a CDM

OK, let’s assume that we would want to create a CDM. How to go about it? Well, there are three possible starting points and surprisingly enough, each of them will end up with a CDM, though the letter C will mean something different in each case.

Approach 1: Dominant system

The first starting point is the data model of our dominant system. Most things are done in this system, most people in our company work in this system, so why not? Well, for so many reasons, I don’t even know where to start. I’ll just begin with the most obvious one: the system’s data model uses incomprehensible definitions. Okay, ADRS and STRT may work, but INVLN1TXP is not immediately recognizable as invoice_line_1_tax_percentage. Another reason might be that we decide to finally upgrade our system to version 13.2. Along with the upgrade comes a completely new data model. Damn. There goes the basis of the CDM. A final reason is the possible replacement of the system. Our company will never do this of course, but maybe the vendor of this system will stop supporting this outdated system, who knows. Still, even using this unstable basis, you end up with a CDM: Crappy Data Model.

All angles intermediate

In an attempt to be complete and look at the picture from all angles: in certain situations this is actually the best approach, but only if most systems use a similar data structure and translations to and from a CDM become more of a hindrance than a help.

Approach 2: Standards

There must be a better starting point. Since we are the tech guys, we first search the Internet instead of talking to people and so we find: Industry Standards. There’s your OAGIS, CIMM, UBL and what have you. More than enough to choose from, so we pick one or two and off we go, scratching what we don’t use, adding what we think is useful and adapting it to fit the immediate needs of the projects. This also leads us to a CDM: Canonical Data Model. There’s that difficult word again, so I will define it here. “In most fields, a canonical form specifies a unique representation for every object…”. What does that even mean? Speak English will you? OK, it means that for every system, if they wish to talk to another system, they have to translate their data format to the canonical data format. One country, one language, one ring to rule them all, etc. I agree, this is a nice technical solution that works.

Calculation intermediate

This section can be skipped because of its calculations, but it has some nice drawings. I spend a lot of time on these drawings, so if you skip this section I will be a bit disappointed. But don’t mind my feelings, carry on.
First, I’ll introduce a system: System A. System A doesn’t have anyone to talk to, so he’s a bit lonely.

I’ll give him some friends: B and C. Now, we have to remember, each systems speaks its own language, but the systems still want to communicate with each other. This means each line between two systems does not only represent a line of communication, but also a translation between two languages.

So system A talks to two other systems, B talks to two other systems and C as well. 3 systems, each talking to 2 other systems: 3*2=6 translations. But there are only 3 lines, not 6! Well spotted. When system A talks to system B, system B also talks to system A, so we counted double and have to divide the total by 2.

Warning: difficult calculations ahead!

So our total comes to 3*2/2=3 translations. Extrapolating this to more systems, let’s say n, the calculation becomes n*(n-1)/2. This gets out of control pretty quickly.

3 systems: 3*2/2 = 3
5 systems: 5*4/2 = 10
10 systems: 10*9/2 = 45
20 systems: 20*19/2 = 190
50 systems: 50*49/2 = 1125

And 50 systems is not that uncommon. Even 200 systems isn’t uncommon.

Now lets perform these calculations with a CDM in the middle. When using a CDM, each system now only has to translate its datamodel to the CDM. In other words, only one translation per system. With 3 systems, CDM seems overkill – 3 vs. 3 – but with 50 systems, the odds look different – 1225 vs. just 50.

Approach 3: Business

“Wow, that looks great. And you have yet another starting point that’s even better?”
Well, I didn’t say that exactly, but we did introduce our business users in the previous post. Maybe they have a nice data model that we can use?
“Yes, I remember now. They were the ones with the processes and the processes should be agnosticistic or something?”
System-agnostic, yes. This means that any process should be unknowing or unaware of the systems it uses and the name each system gives to a TMB. Processes should use the data in business terms, always talking about a Thingamabob, and never call it a TMB or a ThingyMeBob or whatever, but use the name that is Common in your business. The result, surprise, surprise, is the Common Data Model, sometimes called the Business Data Model. But since I didn’t choose BDM as a name for my post, we’ll forget I ever mentioned it. <waves hands like a wizard making you forget>

To be continued…

We now have the theory well covered, but theory alone is not the focus of my blog. The fun, after all, starts when applying theory. So in my next post I’ll show CDM considerations in practice.

Popping Bubbles: communicating between silos using a CDM

What’s the use of a CDM?

Is a kiss is a kiss is a kiss?

How to create a CDM

Approach 1: Dominant system

All angles intermediate

Approach 2: Standards

Calculation intermediate

Approach 3: Business

To be continued…

Meer blogs

Getting the (J)Hipsters aboard

Jarvis Pizzeria: Activating activities and attaining milestones

Setting up Splunk Connect for Kubernetes on Openshift 4.x with Helm

How to setup unit testing for AWS Lambda serverless functions (on NodeJS) ?

Jenkins: Testing with post conditions in your pipeline