Popping Bubbles: CDM in practice
This is part two of the two-parter about CDM. In part one we covered the theory of CDM. In this post we’ll get our hands dirty (and our brains as well) with the practicalities. It’s going to be ugly.
From theory to practice
At the end of the previous post we discovered that the decision to use a CDM in theory is a no-brainer, right? Right? No. Reality is always ready to de-rail theory. Always. How?
Reason 1: Ivory tower
You see, creating a CDM is usually an academic affair, somewhere high up in some ivory tower. It starts practical enough, but data architects will soon start to fly away into hypothetical heaven just so they can make everything fit into one elegant model and hardly ever land back into reality. Especially when recursion and reuse can be applied! They love that stuff (guilty as charged by the way), but nobody will ever understand it. So, not practical and nobody is involved.
Reason 2: Silo view
Furthermore there’s the lack of need for the people in the silos. I’ve worked my entire career between the silos. My job, after all, is, and has always been, to glue two or more systems together, asking questions about how each of them models an address and then somehow make ‘addresLine1’ fit into ‘street’, ‘housenumber’, and ‘housenumberExtension’. It is natural for me to ask without judging (only at work by the way, not at home…). But not so for the people in the silos. For them, the use of addressLine1 is obvious, because of reasons! It has always been, it worked, so why would anybody want to change that?
I can still hear myself explaining about information hiding: “No, no, the data model we use between systems should be independent of the data models of the systems. It will be implementation agnostic, so it will then be easy to replace the existing silos.” Ugh, how inconsiderate I was. Trying to explain to Bob and Frank (you may remember them from the previous post) that all I was trying to do was to make it easy for the company to easily replace their silos, no their babies, making them (Bob and Frank) obsolete. Of course they didn’t want me to.
Reason 3: Nesting
And then there is the nesting. Don’t get me started… Too late, here I go. As you may recall, I started the previous post with nesting: ‘
Customer/Person/ContactInformation/PhoneNumber[type=mobile]‘ instead of Phone Number. Mapping a simple web-form to the entire CDM is hell. Eight input fields are mapped to five different structures with nestings that go up to twelve levels deep. Yuk!
Reason 4: Short term project “architecture”
Lastly, there’s the sadly low level of reuse, or more readable: re-use. This is an important concept that I haven’t yet explained. In my EAI post I used the term adapter. An adapter adapts a system to fit in our application landscape. Each adapter should be designed and built so that all adapters are more or less compatible: the adapters can talk to each other. A system can talk to its adapter, all adapters can talk to each other and therefore all systems can talk to each other. Nice.
Now, ideally, one would take the time to think ahead, look into the future, and try to predict the end result. In that way we might be able to deduce that quite a number of adapters will be used in a similar way by multiple systems (or even processes), but not in exactly the same way. Knowing this, we can scoop 3 adapters together and make a nice generic solution. Ah, generic <sigh of relief>. We (in this case, we, developers) have learned about this concept since our first lecture in automation: when in doubt, make it generic. Always put in some extra effort when there is the slightest chance something can be re-used. Designers (who are nothing more than grown-up-developers, really) are a bit more skeptical and ask around if there is re-use potential. And then nudge everyone into making it generic anyway. Architects (who are nothing more that cocky designers (I realize I’m making enemies everywhere, but I hope you realize I’m not really serious)) think they know what should be generic and what is one-time-use-only and then re-organize everything to make it all generic anyway.
This bunch hopes to be some counterweight to the project managers. (who are nothing more than short-term-thinkers with only one rule: I have to finish this project within time, within budget and within scope. (Of these three, scope usually loses)) (still not serious… just hitting the ball very close to the truth) (again)
Since making something generic causes delay (thinking, discussing, challenging, re-architecting, re-designing, re-building, re-testing, oops, still not generic) and the end result is something more than the original scope of the project, there is always some friction between project managers and the rest.
Anyway, back to reason 4. Depending on who wins this generic-or-not tug-of-war, we either get
- a lot of point-to-point connections between systems where each connection has one sending adapter used only by this connection and one receiving adapter used only by this connection OR
- A number of connections all using the same adapter on one side and using an adapter used only for this connection on the other.
Too many words dude, use a picture. OK. Here you go.
In other words, when the scope of a project is a connection and not an adapter, you end up with only point-to-point connection. Unfortunately, this is the level of integration found in most environments. And it’s just no use putting a CDM between a point-to-point interface, because there’s no re-use and there are only two systems involved that would rather use their own datamodel instead.
We want it, but we don’t want it
So in theory we want the CDM, but in practice nobody wants it. Let’s recap what’s good about it and what we hate about it and try to come to an agreement about the use and usefulness of a CDM. A CDM hides information, reduces the number of mappings, enforces one meaning for one term and one term for one meaning (one common language) and facilitates intercompany communication when based on standards. Ah, sweet and unspoiled paradise, the choice is easy: CDM of course!
And then there’s reality, thrashing through our paradise with its muddy boots. Not all applications connect to all other applications. Not all applications need the same data from other applications. No, for this project we only need this quick and dirty 1 to 1 mapping, no need for CDM, it’s much faster this way and who cares about maintenance? Project-thinking ruins the utopia-sized dot at the horizon. And the nesting, don’t forget about the nesting! Yes, yes, I’ll mention the nesting. Again.
So two questions need answering: how to make CDM more manageable and what’s all this got to do with bubble popping?
Making it more manageable
To most people involved CDM is this big unfathomable Crazy Demonic Monster. It’s overkill and overhead that just does not fit in My Project. That may be true, but Your Project is always part of Our Company and Our Company has Architects. Luckily. Where Projects are mostly concerned with the-time-it-takes-to-finish-the-project, Architects are concerned with The Future, two completely different and difficult to unite timeframes.
You can change this by making maintainability and governance part of the project scope. Data architects and schema custodians should be appointed to help enforce the CDM and make it easier to implement. And finally, the nesting. The type of solution to adopt in order to get rid of, or to cope with, the deep level nestings, depends greatly on where in our landscape we are working. When applying CDM to a front-end talking to back-end type of interface, my recommendation would be to only use the CDM naming, not the nesting. This decision alone will keep your front-end developers very happy and more inclined to use CDM names. And let’s face it, front-end doesn’t use your generic API’s or services, it mostly uses its own database anyway.
When making your company’s generic API’s and services: just use the nesting and not the entire CDM structure. Make a selection of fields that are actually useful in the specific operations of your services. To be extra clear on this point: OAGIS e.g. is a huge model. Your company will probably use only 30% of it, at most, in its business language. One service will in turn only usefully need just a tiny part of that. This part, the super-tiny part, we’ll call a CMM: Common Message Model. It’s derived from the CDM, but only a big as it needs to be in the scope of the Service or API that provides the information (you may recognize this as Bounded Context). This will reduce the CDM to be a Convenient Delightful Matter.
Back to the Bubble game: does a CDM help in getting rid of our silos? Well, at the very least the silos will now be united against their common enemy: the CDM. But I don’t think that’s the only thing a CDM will achieve. It’s very hard to be against the concept of a CDM, one common language in our company. It’s much like the artificial language Esperanto. The concept and the idea behind it are great! The big problem is implementing it, making it stick, and making everybody aware that the extra effort they need to put into using it, is actually worth it in the end. I’ve been involved in a number of projects, each in varying stages of setting up, adapting to, and using a CDM and each falling prey to one or more of the pitfalls mentioned in this post. But in the end, each of these companies persevered and put in the effort to try and implement a version of a CDM. Why? Because it is usually the start of actual business involvement in IT, and that’s the real start of, maybe not knocking down, but more collaboration between silos.
So even though some of us are stuck at doing EAI and point-to-point connections, it’s still worthwhile to use some sort of CDM and keep reaching out to your business people to help you change from a Canonical Data Model to a real Common Data Model.
My next post will deal with SOA. SOA introduces a service landscape to completely hide all of our systems and silos once and for all and should, in theory, be the end of the bubbles (we’ll see about that). Since SOA is a type of architecture, it may turn out to be a two- or three-parter. In fact, I may even need to do a sidestep or a prelude to explain more about principles and patterns.