<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:iweb="http://www.apple.com/iweb" version="2.0">
  <channel>
    <title>Dalton’s Blog</title>
    <link>http://www.dcervo.com/blog/Home/Home.html</link>
    <description>This blog is about multiple topics around Master Data Management.  Topics discussed are: Master Data Management, MDM, Data Quality, DQ, Customer Data Integration, CDI, Data Governance, DG, Customer Data Governance, Metadata</description>
    <generator>iWeb 3.0.4</generator>
    <item>
      <title>The System of Record in MDM</title>
      <link>http://www.dcervo.com/blog/Home/Entries/2011/6/6_The_System_of_Record_in_MDM.html</link>
      <guid isPermaLink="false">b8af3551-9f88-4965-beb9-cbe5a2a9b733</guid>
      <pubDate>Mon, 6 Jun 2011 21:34:54 -0600</pubDate>
      <description>&lt;a href=&quot;http://www.dcervo.com/blog/Home/Entries/2011/6/6_The_System_of_Record_in_MDM_files/benefits-organize.jpg&quot;&gt;&lt;img src=&quot;http://www.dcervo.com/blog/Home/Media/object001_2.jpg&quot; style=&quot;float:left; padding-right:10px; padding-bottom:10px; width:182px; height:136px;&quot;/&gt;&lt;/a&gt;Read the article at &lt;a href=&quot;http://hubdesignsmagazine.com/2011/05/30/the-system-of-record-in-mdm-by-dalton-cervo/&quot;&gt;Hub Designs Magazine&lt;/a&gt;.</description>
      <enclosure url="http://www.dcervo.com/blog/Home/Entries/2011/6/6_The_System_of_Record_in_MDM_files/benefits-organize.jpg" length="26628" type="image/jpeg"/>
    </item>
    <item>
      <title>Just Give me the Data!</title>
      <link>http://www.dcervo.com/blog/Home/Entries/2010/4/7_Just_Give_me_the_Data%21.html</link>
      <guid isPermaLink="false">637125df-e6d4-41df-bb09-5af6daab3778</guid>
      <pubDate>Wed, 7 Apr 2010 22:56:48 -0600</pubDate>
      <description>&lt;a href=&quot;http://www.dcervo.com/blog/Home/Entries/2010/4/7_Just_Give_me_the_Data%21_files/JustGiveMeTheData.jpg&quot;&gt;&lt;img src=&quot;http://www.dcervo.com/blog/Home/Media/object002_1.jpg&quot; style=&quot;float:left; padding-right:10px; padding-bottom:10px; width:183px; height:137px;&quot;/&gt;&lt;/a&gt;I was called into an on-going project. The cross functional team had been formed to solve the issue of duplicate contacts. They had been meeting weekly for a couple months and really not getting anywhere.&lt;br/&gt;&lt;br/&gt;I was invited because I was leading a very productive data de-duplication effort for organization information. Besides, as the Data Quality lead, the contact de-duplication project would eventually fall under my oversight anyway.&lt;br/&gt;&lt;br/&gt;I wasn't much surprised, but very intrigued by the fact the team kept discussing the consequences of having duplicate contact information. I mean, that is an important aspect for sure: identify business impact. But once you agree the duplicates are a problem, it is time to move on and decide how to eliminate them. Anyway, I came in after weeks of meetings. They were having these continued philosophical discussions about actual and even hypothetical problems caused by this particular issue. In my mind, overanalyzing the issue.&lt;br/&gt;&lt;br/&gt;Eventually, I had to ask some questions because they're really going in circles. So I asked what I believe was a very fundamental question: what is your definition of a duplicate contact? Got those blank looks! I continued: how are we supposed to eliminate duplicates if we don't even know what a duplicate is? I think I got their attention. Of course, some of them started giving some suggestions, such as: if a contact has the same phone number, or the same email, same first name/last name, belong to the same organization, and combinations of those.&lt;br/&gt;&lt;br/&gt;I probably got them even more upset with me, because I had even more questions, such as: how many of our contacts have phone numbers, or email addresses, first name/last name, etc, etc?  Obviously they had no clue. I kept rubbing salt in the wound, not to be a pain, but just to brainstorm some ideas and get my point across. Eventually we agreed we couldn't make a determination regarding what a duplicate is or how effectively we could eliminate dupes without first looking at the data. Those endless philosophical discussions wouldn't get us anywhere.&lt;br/&gt;&lt;br/&gt;Anyhow, in data projects, there is a balance of how much business requirement you need and how much data analysis is necessary to support your requirements and achieve your objectives. You can't assume you'll have a clean set of requirements ready to be executed when you have no clue what's your data like. I believe that's one of the challenges faced when business is trying to work with IT. IT is expecting the business to tell them exactly what they need. But the business has a hard time expressing that because they don't have all the details, and sometimes don't have the necessary skills to do the right analysis. Plus, IT has a tendency to limit how much bulk data access is given to the business, complicating the data analysis.&lt;br/&gt;&lt;br/&gt;“Just give me the data!” is a necessary attitude. Identify your business issues, but don't overanalyze it before looking at the data. Chances are the data will dictate your solution anyway by supporting or even establishing business rules.</description>
      <enclosure url="http://www.dcervo.com/blog/Home/Entries/2010/4/7_Just_Give_me_the_Data%21_files/JustGiveMeTheData.jpg" length="122135" type="image/jpeg"/>
    </item>
    <item>
      <title>Making Data Governance as simple as possible, but not simpler - Part 2</title>
      <link>http://www.dcervo.com/blog/Home/Entries/2009/12/6_Making_Data_Governance_as_simple_as_possible,_but_not_simpler_-_Part_2.html</link>
      <guid isPermaLink="false">1ec56832-2afb-4862-b40a-d51dc62a007f</guid>
      <pubDate>Sun, 6 Dec 2009 14:07:47 -0700</pubDate>
      <description>&lt;a href=&quot;http://www.dcervo.com/blog/Home/Entries/2009/12/6_Making_Data_Governance_as_simple_as_possible,_but_not_simpler_-_Part_2_files/DataGovernanceDiagram.jpg&quot;&gt;&lt;img src=&quot;http://www.dcervo.com/blog/Home/Media/object000_1.jpg&quot; style=&quot;float:left; padding-right:10px; padding-bottom:10px; width:182px; height:131px;&quot;/&gt;&lt;/a&gt;1. A preface to part 2&lt;br/&gt;&lt;br/&gt;I'd like to thank the many comments and positive feedback I received on &lt;a href=&quot;http://www.dcervo.com/blog/Home/Entries/2009/11/30_Making_Data_Governance_as_simple_as_possible,_but_not_simpler_-_Part_1.html&quot;&gt;Part 1&lt;/a&gt;, either through direct comments on my blog or through LinkedIn group discussions.&lt;br/&gt;&lt;br/&gt;Many experts in the field have contributed with comments, such as Jim Harris, Dylan Jones, Charles Blyth, Dan Power,  and Phil Simon. I have also received very positive feedback from users engaged in DG efforts. Thanks everyone! &lt;br/&gt;&lt;br/&gt;But I'd like to thank Tom Pantano particularly. He provided some very good fundamental discussion material. He pointed out that my proposed definition of DG was too framework specific, which can potentially restrict the very reaching nature of DG. Interesting enough, when looking at some other definitions, I did think the same thing about some other definitions. What I tried to do was to abstract what I thought was specific into essentially 5 high levels categories: Data Management, Business Process Management, Compliance Management, People Management, and Technology Engagement.&lt;br/&gt;&lt;br/&gt;I thought those 5 levels were generic enough that would not minimize the largely encompassing nature of DG, and generalizing it any more would risk objectivity. That makes me feel my Einstein quote premise has been very appropriate indeed. How simple/generic can or should we be?&lt;br/&gt;&lt;br/&gt;The good news is the actual definition won't impact my postings too much. The Component Model I presented in part 1 is what I'll be referring to mostly anyway. We can keep both definitions up for grabs until we mature our discussion. Thanks again Tom for your contribution. It was great!&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;2. A note about my Data Governance Component Model&lt;br/&gt;&lt;br/&gt;Notice I use the word “management” for 4 out of the five components, and I chose the word “engage” when referring to technology. I do have a reason.&lt;br/&gt;&lt;br/&gt;I'll get more into Business vs. IT for DG. But essentially, I see DG mostly as a Business driven activity while  technology mostly as an IT maintained activity. Hopefully technology is always selected based on business needs, of course. But the point is, when DG is established, chances are there are existing technologies already that needs to be leveraged, and some new ones that will be required. That's the reason for the word engagement.&lt;br/&gt;&lt;br/&gt;Furthermore, I represent the relationship between technology and everything else with spring/bi-directional arrows. That is to suggest DG components may have to adapt to existing technologies and new technologies may have to be added to complement what is missing. Depending on the rigidity of your organization, one thing may have to give more than the other, but hopefully it will be “stable” enough to sustain the DG “house.”&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;3. What's MDM got to do with it?&lt;br/&gt;&lt;br/&gt;DG is associated with Master Data Management (MDM) a lot. One may wonder if one can exist without the other.&lt;br/&gt;&lt;br/&gt;What's common between them is “data.” Data exists without MDM, therefore DG can certainly exist without it too. On the other hand, MDM is also about “management” of data, which is one of the of components of data governance.  If you're doing MDM, chances are you are executing one or more of the DG functions whether your DG is formalized or not. Data quality, for example, is practically a must on an MDM program.&lt;br/&gt;&lt;br/&gt;As I said, you don't need MDM to have DG. However, MDM can be a great enabler to DG. One of the aspects of MDM is creating a unified view of master data that is consistent and understood across the organization. DG can vastly benefit from that.&lt;br/&gt;&lt;br/&gt;If you want to learn more about MDM, please check some of my other &lt;a href=&quot;http://www.dcervo.com/blog/Home/Home.html&quot;&gt;postings&lt;/a&gt; and &lt;a href=&quot;http://www.dcervo.com/blog/Case_Studies/Case_Studies.html&quot;&gt;cases studies&lt;/a&gt;.&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;4. Levels of Governance and Stewardship&lt;br/&gt;&lt;br/&gt;A single DG team may not be sufficient depending on the size of your organization. You may need multiple levels of governance, starting with an Enterprise DG that oversees more specialized DG teams. My experience is with an Enterprise DG overseeing multiple subject area governance teams, such as: Customer DG, Product DG, Install Base DG, etc.&lt;br/&gt;&lt;br/&gt;The organization of your governance structure may also be influenced by the adopted model of data stewardship.  After all, a lot of the DG functions are actually carried out by the data stewards. I really recommend you read the &lt;a href=&quot;http://www.information-management.com/white_papers/-10016127-1.html&quot;&gt;Five Models for Data Stewardship&lt;/a&gt; by Jill Dyche for more on this.&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;5. What's next?&lt;br/&gt;&lt;br/&gt;I have recently attended a webinar on Data Governance from InitiateSystems with &lt;a href=&quot;http://twitter.com/wmmarty&quot;&gt;Marty Moseley&lt;/a&gt; and &lt;a href=&quot;http://twitter.com/jilldyche&quot;&gt;Jill Dyché&lt;/a&gt;. Jill emphatically stated that “there is, still, no template for Data Governance” and “best practices, not templates, are emerging that can provide guidance.” (thanks to &lt;a href=&quot;http://twitter.com/ocdqblog&quot;&gt;Jim Harris&lt;/a&gt;, who live tweeted during the webinar and published the important quotes &lt;a href=&quot;http://www.ocdqblog.com/home/live-tweeting-data-governance.html&quot;&gt;here&lt;/a&gt;).&lt;br/&gt;&lt;br/&gt;I definitely agree with Jill. My next posting, then, is not about creating a template. It is a guide based on my experiences with the intent to help you adapt the DG component model I provided to your situation.&lt;br/&gt;</description>
      <enclosure url="http://www.dcervo.com/blog/Home/Entries/2009/12/6_Making_Data_Governance_as_simple_as_possible,_but_not_simpler_-_Part_2_files/DataGovernanceDiagram.jpg" length="134181" type="image/jpeg"/>
    </item>
    <item>
      <title>Making Data Governance as simple as possible, but not simpler - Part 1</title>
      <link>http://www.dcervo.com/blog/Home/Entries/2009/11/30_Making_Data_Governance_as_simple_as_possible,_but_not_simpler_-_Part_1.html</link>
      <guid isPermaLink="false">d3df8e9c-6087-4c26-9317-0b6587d67e92</guid>
      <pubDate>Mon, 30 Nov 2009 11:35:43 -0700</pubDate>
      <description>&lt;a href=&quot;http://www.dcervo.com/blog/Home/Entries/2009/11/30_Making_Data_Governance_as_simple_as_possible,_but_not_simpler_-_Part_1_files/DataGovernanceDiagram_1.jpg&quot;&gt;&lt;img src=&quot;http://www.dcervo.com/blog/Home/Media/object000_2.jpg&quot; style=&quot;float:left; padding-right:10px; padding-bottom:10px; width:182px; height:131px;&quot;/&gt;&lt;/a&gt;1. Introduction&lt;br/&gt;&lt;br/&gt;“Make everything as simple as possible, but not simpler” - Albert Einstein&lt;br/&gt;&lt;br/&gt;That is one of my favorite quotes. In general, I think people have a tendency to complicate things more than needed. There are complex problems, but I do believe there are ways to break them down in parts so we can solve them with simple solutions.&lt;br/&gt;&lt;br/&gt;Data Governance is no different. It can be quite overwhelming to implement one in your organization. My goal with this series of postings is to make it as simple as possible. Of course there are challenges, and that's where the “not simpler” part comes into play. It is going to be up to you to make the proper adjustments for your situation.&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;2. I Should Know Better&lt;br/&gt;&lt;br/&gt;My goal was simple, or so I thought. I had the following steps in mind to help answering the traditional What, Why, How, Who, When and Where:&lt;br/&gt;&lt;br/&gt;1.	Get the Data Governance definition&lt;br/&gt;2.	Come up with a model to guide you on what steps to decide what you need and why&lt;br/&gt;3.	Describe how/when/where to engage the proper people, and establish an effective process&lt;br/&gt;4.	How to achieve sustainability, scalability and maintainability &lt;br/&gt;&lt;br/&gt;I was prepared for 2-4, assuming I could find a good reference for 1.  But item 1 was not so easy. I had to a take a step back.&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;3. The Many Definitions of Data Governance&lt;br/&gt;&lt;br/&gt;I wanted to find a definition of Data Governance we could use as a foundation to build upon during our discussions.&lt;br/&gt;&lt;br/&gt;I did expect to find some variations, but I can say I found more than expected.&lt;br/&gt;&lt;br/&gt;The following are not necessarily definitions. Some are simply characteristics some specialists use when addressing DG. &lt;br/&gt;&lt;br/&gt;Marty Moseley, CTO of &lt;a href=&quot;http://www.initiate.com/Pages/default.aspx&quot;&gt;Initiate Systems&lt;/a&gt; uses the 5 pillars of Data Governance: Policies, Processes, Business Rules, People &amp;amp; Roles, and Technologies.&lt;br/&gt;&lt;br/&gt;Philip Russon, Senior Manager at TDWI Research uses the &lt;a href=&quot;http://www.tdwi.org/display.aspx?id=9410&quot;&gt;3 pillars of Data Governance&lt;/a&gt;: Compliance, Transformation, and Integration.&lt;br/&gt;&lt;br/&gt;&lt;a href=&quot;http://www.b-eye-network.com/view/3284&quot;&gt;Duffie Brunson from B-eye-Network&lt;/a&gt; says that “The true function of governance is to actively link integrated business and technology teams with corporate and strategic initiatives. Within this context, governance becomes an integral part of enterprise line management. Executed properly, the governance function can actively and effectively reallocate business, technology, reporting and analytic resources to align with rapidly changing market demands.”&lt;br/&gt;&lt;br/&gt;&lt;a href=&quot;http://www.baseline-consulting.com/uploads/BCG_WP_GettingAGrip_2009(3).pdf&quot;&gt;Linda McHugh from Baseline Consulting&lt;/a&gt; states that “Data governance is about the creation, maintenance and interpretation of enterprise information policy. The implementation of enterprise information policies is data management, some of which are specialized functions performed by data professionals while others are embedded in procedures across many functions.”&lt;br/&gt;&lt;br/&gt;The &lt;a href=&quot;http://www.datagovernance.com/&quot;&gt;Data Governance Institute&lt;/a&gt; (DGI) defines DG as “... a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.” Furthermore, it defines the following focus areas: Policy, Standards, and Strategy; Data Quality; Privacy, Compliance, and Security; Architecture and Integration; Data Warehouse and BI; Management Alignment.&lt;br/&gt;&lt;br/&gt;David Loshin, author of the book &lt;a href=&quot;http://www.amazon.com/Master-Data-Management-OMG-Press/dp/0123742250/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1259611047&amp;sr=1-1&quot;&gt;Master Data Management&lt;/a&gt;, says “data governance is expected to ensure that the data meets the expectations of all the business purposes, in the context of data stewardship, ownership, compliance, privacy, security, data risks, data sensitivity, metadata management, and MDM.”&lt;br/&gt;&lt;br/&gt;Steve Sarsfield, author of the book &lt;a href=&quot;http://www.amazon.com/exec/obidos/ASIN/1849280126/flatwave-20&quot;&gt;The Data Governance Imperative&lt;/a&gt;, states “Data governance is about changing the hearts and minds of your company to see the value of information quality...data governance is a set of processes that ensures that important data assets are formally managed throughout the enterprise...at the root of the problems with managing your data are data quality problems...data governance guarantees that data can be trusted...putting people in charge of fixing and preventing issues with data...to have fewer negative events as a result of poor data.”&lt;br/&gt;&lt;br/&gt;Tony Fisher, President and CEO of DataFlux and author of the book &lt;a href=&quot;http://www.amazon.com/Data-Asset-Companies-Business-Success/dp/0470462264/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1259611089&amp;sr=1-1&quot;&gt;The Data Asset&lt;/a&gt;, describes the Data Governance Maturity Model with 4 levels - Undisciplined, Reactive, Proactive and Governed - framed into Reward vs. Risk and Business Capabilities along with Technology Adoption.&lt;br/&gt;&lt;br/&gt;Finally, &lt;a href=&quot;http://en.wikipedia.org/wiki/Data_governance&quot;&gt;Wikipedia defines DG&lt;/a&gt; as following: “Data governance is an emerging discipline with an evolving definition. The discipline embodies a convergence of data quality, data management, business process management, and risk management surrounding the handling of data in an organization. Through data governance, organizations are looking to exercise positive control over the processes and methods used by their data stewards to handle data.”&lt;br/&gt;&lt;br/&gt;I’m sure my list is not comprehensive, but enough to get my point across.&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;	1.	Yet Another Definition&lt;br/&gt;&lt;br/&gt;As I said, I wanted a common definition, but couldn’t pick one of the above. Therefore I created my own, not necessarily better than any other. It is more like a combination, and it is the one I’ll be using in this series. &lt;br/&gt;&lt;br/&gt;Data Governance is a discipline encompassing Data Management, Business Process Management, Compliance Management, People Management, and Technology Engagement for the purpose of using data as an asset driving strategic objectives.&lt;br/&gt;&lt;br/&gt;And I like to represent its components as following:&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;	1.	What’s next?&lt;br/&gt;&lt;br/&gt;In the next posting, I’ll provide some additional information on my Data Governance Model, and how to decide what components you need in your particular case.&lt;br/&gt;&lt;br/&gt;Stay tuned, and feel free to start adding comments or asking questions.</description>
      <enclosure url="http://www.dcervo.com/blog/Home/Entries/2009/11/30_Making_Data_Governance_as_simple_as_possible,_but_not_simpler_-_Part_1_files/DataGovernanceDiagram_1.jpg" length="134181" type="image/jpeg"/>
    </item>
    <item>
      <title>A Well-oiled Machine is Critical to Data Quality</title>
      <link>http://www.dcervo.com/blog/Home/Entries/2009/11/1_A_Well-oiled_Machine_is_Critical_to_Data_Quality.html</link>
      <guid isPermaLink="false">5b61131c-d367-40e9-adda-ac2519a72be0</guid>
      <pubDate>Sun, 1 Nov 2009 19:49:02 -0700</pubDate>
      <description>&lt;a href=&quot;http://www.dcervo.com/blog/Home/Entries/2009/11/1_A_Well-oiled_Machine_is_Critical_to_Data_Quality_files/Well_oiled_machine_cogs.jpg&quot;&gt;&lt;img src=&quot;http://www.dcervo.com/blog/Home/Media/object005_1.jpg&quot; style=&quot;float:left; padding-right:10px; padding-bottom:10px; width:182px; height:109px;&quot;/&gt;&lt;/a&gt;The beauty of human beings is that people will look for creative ways to solve their problems. That means, when users have technical problems or run into business limitations during data entry, they will find ways to do it, even if it means breaking business rules or overriding well defined processes. From a Data Quality perspective, that is not a good thing, but who is to blame the users? After all, they may be facing a particular customer need that doesn't fit an existing business process, or a system bug that is delaying a high profit transaction.&lt;br/&gt;&lt;br/&gt;Let's assume you organization does have all elements in place, such as Data Governance, Data Stewardship, Data Quality, IT support, etc. Users are less likely to engage the proper teams if their confidence in the support process is low. They may think: “oh boy, by the time I get this problem resolved through the proper mechanisms, it will be too long and I'll have a customer satisfaction issue beyond repair.” Therefore, for the “benefit” of the organization, they act with imagination and solve the immediate problem with non-approved solutions. Making matters worse, detecting these out-of-spec practices and associated data issues are sometimes difficult to monitor and correct.&lt;br/&gt;&lt;br/&gt;With that said, your goal as an organization should be not only to have the proper elements of a well governed organization, but have them working effectively as well. That comes with maturity, and a constant focus on process improvement. Simply improving your data entry process alone is not enough. You have to improve the support process around it. Just about everything is constantly changing: business needs, business landscape, technology, people, etc. Your only hope is to have an efficiently adaptive model that in spite of all these changes, can continue to deliver results quickly. Let's focus our creativity on this problem, and be really dull when it comes down to being creative breaking business rules!</description>
      <enclosure url="http://www.dcervo.com/blog/Home/Entries/2009/11/1_A_Well-oiled_Machine_is_Critical_to_Data_Quality_files/Well_oiled_machine_cogs.jpg" length="90635" type="image/jpeg"/>
    </item>
    <item>
      <title>The Data Profile Spectrum</title>
      <link>http://www.dcervo.com/blog/Home/Entries/2009/9/14_The_Data_Profile_Spectrum.html</link>
      <guid isPermaLink="false">86d1b4ae-7cdc-42a8-a669-997e1b5f3c46</guid>
      <pubDate>Mon, 14 Sep 2009 21:30:57 -0600</pubDate>
      <description>&lt;a href=&quot;http://www.dcervo.com/blog/Home/Entries/2009/9/14_The_Data_Profile_Spectrum_files/DataProfileSpectrum.png&quot;&gt;&lt;img src=&quot;http://www.dcervo.com/blog/Home/Media/object001_3.jpg&quot; style=&quot;float:left; padding-right:10px; padding-bottom:10px; width:183px; height:109px;&quot;/&gt;&lt;/a&gt;No matter how long you have been working with data quality, or simply data for that matter, you certainly know how important data profiling is. You don't know what you don't know, and data profile helps you bridge the gap.&lt;br/&gt;&lt;br/&gt;Data profiling is a critical activity whether you are migrating a data system, integrating a new source into your data warehouse, synchronizing multiple systems, implementing an MDM repository, or just trying to measure and improve the quality of your data.&lt;br/&gt;&lt;br/&gt;However, data profile is quite often an unqualified activity. Sometimes that is OK, but sometimes it is not. By “unqualified” I mean not much information or requirements are given about what the data profile is all about. Sometimes that is OK because you have either no knowledge at all or very minimum knowledge about the data you're profiling. But very often, you do know quite a bit already, and maybe you're simply trying to fit your data to a specific set of rules.&lt;br/&gt;&lt;br/&gt;Bear with me, but I feel like I need to add one more definition before I continue expressing my point. I keep using the term “knowledge about your data.” What do I mean? There are multiple levels of knowledge in this case. For each data element or combination of data elements, there are so many associated properties. There is data type, data content, data pattern, data association, data conformance, business rules, etc. It could also be about what the data should be and not only about what it is. As you can see, how much you might know could vary a lot.&lt;br/&gt;&lt;br/&gt;When you combine the objective of your profile along with how much you know about the data already, you end up with a lot of different combinations. That is why I like to use the term Data Profile Spectrum. And remember, different attributes could be at different parts of the spectrum. No wonder data profiling can be a lot more complex than people give it credit for.&lt;br/&gt;&lt;br/&gt;The picture above depicts the Data Profile Spectrum.&lt;br/&gt;&lt;br/&gt;Let's first talk about Data Profile Artifacts. By that I mean what is usually provided by a data quality tool, or maybe something you put together yourself. Basically it is what you have to analyze your data, from data completeness to pattern analysis, data distribution, and a lot more. I won't get into a lot of detail about the artifacts. Please refer to Jim Harris' article &lt;a href=&quot;http://www.ocdqblog.com/home/adventures-in-data-profiling-part-1.html&quot;&gt;Adventures in Data Profiling&lt;/a&gt; for more on that and some other cool stuff. The only thing I'll point out is notice I used tetrominoes to represent the artifacts. That is just to call attention to the fact that data profile artifacts are pieces that can be applied and/or combined in a variety of ways to accomplish what you need. For example, you may use the data distribution artifact during discovery just to understand what random values you have at what percentage. However, you may use the same artifact on a Country Code field to identify the percentage of valid values. It is the same artifact applied slightly different dependent on where you are in the spectrum.&lt;br/&gt;&lt;br/&gt;The Prior Knowledge scale represents how much you already know about what the data is or what the data should be. It is important to grasp where you are in that scale so you know how to apply the right artifacts properly. I mean, why would you need to verify uniqueness when a primary key constraint already exist in the database for that particular field? That is just an example, but hopefully you get the idea.&lt;br/&gt;&lt;br/&gt;Another twist is being able to identify where you should be in that scale for a given profiling activity. I can see some eyes rolling, but I'll explain. Here is a real example I faced. We were about to start a data conversion activity. I was asked to “go profile the data to be converted.” My reply was that we needed more information than that. I mean, if we were to convert one system into another, we should have quite a bit of knowledge about the new system, which would drive what and how we profile the old system. This is definitely not a low-end of the scale profile activity in my spectrum.&lt;br/&gt;&lt;br/&gt;Interesting enough, my reply wasn't quite well received. I hadn't written this blog entry yet, so this concept wasn't quite formalized in my mind. I was reminded that data profiling should be the first thing to occur, so we could “discover” things about our data. My point was our goal was not to find out information about our data. Our goal was to fit our data into the new system. Doing “primitive” data profiling would be a useless activity. We had to profile our data bounded by the new system. Well, I eventually convinced them, but I wish I had the Data Profile Spectrum handy back then.&lt;br/&gt;&lt;br/&gt;In summary, I had a request to do a “No Knowledge” profile, when I should be asked to do something at a higher end of the Data Profile Spectrum. At the time of the request, we didn't know much. One could have thought the request was pertinent when using the Data Profile Spectrum. However, you not only need to consider where you are in the spectrum, but also where you should be. If they don't match, something is missing.&lt;br/&gt;&lt;br/&gt;I have several other real examples of data profiling requests, but it is getting pretty late, and I want to post this entry before I go to bed. If you care to read more about them, please let me know.</description>
      <enclosure url="http://www.dcervo.com/blog/Home/Entries/2009/9/14_The_Data_Profile_Spectrum_files/DataProfileSpectrum.png" length="369522" type="image/png"/>
    </item>
    <item>
      <title>Which MDM Approach to Use: Analytical, Operational, or Enterprise?</title>
      <link>http://www.dcervo.com/blog/Home/Entries/2009/7/14_Which_MDM_Approach_to_Use__Analytical,_Operational,_or_Enterprise.html</link>
      <guid isPermaLink="false">2bbf243b-f747-4f5a-9631-a4cf6019c92b</guid>
      <pubDate>Tue, 14 Jul 2009 23:24:57 -0600</pubDate>
      <description>&lt;a href=&quot;http://www.dcervo.com/blog/Home/Entries/2009/7/14_Which_MDM_Approach_to_Use__Analytical,_Operational,_or_Enterprise_files/which_mdm.jpg&quot;&gt;&lt;img src=&quot;http://www.dcervo.com/blog/Home/Media/object051.jpg&quot; style=&quot;float:left; padding-right:10px; padding-bottom:10px; width:184px; height:108px;&quot;/&gt;&lt;/a&gt;In my last posting, Setting the stage for MDM: some definitions, I described the 3 approaches to MDM that can solve for operational, analytical or both operational and analytical aspects of Master Data. They are, respectively, Operational MDM, Analytical MDM, and Enterprise MDM.&lt;br/&gt;&lt;br/&gt;But which one is the right one for your organization?&lt;br/&gt;&lt;br/&gt;Before I answer that question, let's talk about the major drivers to an MDM initiative. When building a business case for MDM, several experts agree to three major benefits: risk mitigation, cost reduction, and revenue growth (please see references at the end of the posting).&lt;br/&gt;&lt;br/&gt;Risk mitigation involves reducing the possibility of non regulatory compliance (financial reports, SOX, etc.); non legal compliance (contracts, compensation, privacy, etc.); lawsuits and litigation; audit findings and loss of certifications; audit and legal costs; and a company's reputation.&lt;br/&gt;&lt;br/&gt;Cost reduction involves lowering costs due to inconsistent, inaccurate, and non-timely delivery of data. These costs are: returned mailing and products; shipping fines; invoicing delays; wasted direct marketing; IT (maintenance of redundant systems, data reconciliation, consulting fees, software maintenance fees, etc.); low productivity related to inefficient processes, redundancy, and rework.&lt;br/&gt;&lt;br/&gt;Revenue growth are mostly related to strategic objectives, such as marketing campaigns; channel management; value justification; brand identity; demand creation; and merges &amp;amp; acquisitions.&lt;br/&gt;&lt;br/&gt;I recommend you first decide around which one(s) of those areas you're going to write your business case. Once you make that decision, you can use the following table as a guideline to what MDM approach you should implement.&lt;br/&gt;&lt;br/&gt;But please, read the following “warnings” regarding the ensuing table:&lt;br/&gt;- Enterprise MDM is a combination of Analytical and Operational MDM's. With that said, Enterprise MDM could be used to solve for all cases. My objective is to recommend the minimum necessary to achieve what you need.&lt;br/&gt;- Use this as a general guideline only. I'm sure you can tell there are some overlaps among the three business cases. One could correctly argue that by mitigating certain risks, you could lower the costs of doing business raising questions if it should be put in the cost reduction category. Also, better data could reduce cost and improve marketing and consequently grow revenue. Anyway, please use your discretion.</description>
      <enclosure url="http://www.dcervo.com/blog/Home/Entries/2009/7/14_Which_MDM_Approach_to_Use__Analytical,_Operational,_or_Enterprise_files/which_mdm.jpg" length="114371" type="image/jpeg"/>
    </item>
    <item>
      <title>Setting the Stage for MDM: some definitions</title>
      <link>http://www.dcervo.com/blog/Home/Entries/2009/7/7_Setting_the_Stage_for_MDM__some_definitions.html</link>
      <guid isPermaLink="false">b21f8f38-7621-4611-a11d-04d9ea075062</guid>
      <pubDate>Tue, 7 Jul 2009 23:01:50 -0600</pubDate>
      <description>&lt;a href=&quot;http://www.dcervo.com/blog/Home/Entries/2009/7/7_Setting_the_Stage_for_MDM__some_definitions_files/MDM_fig2_Approaches.jpg&quot;&gt;&lt;img src=&quot;http://www.dcervo.com/blog/Home/Media/object052.jpg&quot; style=&quot;float:left; padding-right:10px; padding-bottom:10px; width:185px; height:91px;&quot;/&gt;&lt;/a&gt;Master Data Management or MDM is everywhere these days. Executives have heard how MDM is going to save their organizations by revolutionizing how companies deal with their data, and making them more agile, competitive, and successful.&lt;br/&gt;&lt;br/&gt;I'm not arguing with that at all. I do believe MDM is capable of achieving all what has been said if done correctly. But it seems like there is quite a bit of disparity to what people call MDM. I've seen organizations simply doing a Data Integration project and calling it MDM. Granted, Data Integration is often enough an important step to getting to MDM, but it is not an MDM per se.&lt;br/&gt;&lt;br/&gt;With that in mind, I think it is time to set the stage and have some definitions. I say we need to understand the MD (Master Data) part first, before we can define the second M (Management).&lt;br/&gt;&lt;br/&gt;Master data is information that is key to operational and analytics/reporting aspects of business. This key business information may include data about the following entities: customers, products, suppliers, partners, employes, materials, etc. Master data is often non-transactional in nature, but it supports transactional processes and operations, as well as business intelligence via analytics and reporting. Master data is normally used by multiple functional groups and stored in disparate systems across the organization. Since it is commonly stored in siloed systems, the possibility for inaccurate and/or duplicate master data exists. Simply put, master data is that persistent, non-transactional data defining a business entity for which there should be an agreed upon view across the organization.&lt;br/&gt;&lt;br/&gt;Notice the two distinct aspects of Master Data: operational and analytics. From that definition, one may say that an MDM project not addressing both aspects is not truly MDM. I'm not that extremist. However, I like to use the following terms to distinguish what is being addressed: Operational MDM, Analytical MDM, and Enterprise MDM. These are not new terms – that's right, I won't take credit for them. I have seen white papers using those terms (sorry for not providing appropriate credits, but I really don't have links to those documents anymore). I'm just surprised that those terms are not used more often to help distinguish what MDM approach in being implemented.&lt;br/&gt;&lt;br/&gt;Which one should you implement? You guessed it: it depends on what you're trying to accomplish. Historically, Analytical MDM, implemented in Data Warehouses, has been the most common MDM approach adopted by many organizations, mostly due to its low impact to the company's operational systems. This is still where most of master data is managed today. That is not saying it is the right one. As a matter of fact, most would argue that's not the appropriate solution to manage master data. But that is a topic for another posting.&lt;br/&gt;&lt;br/&gt;The diagram above depicts the level of intrusiveness of each approach. This picture is not suggesting phases to follow when implementing MDM (another posting).&lt;br/&gt;&lt;br/&gt;To complicate matters even more, there are potentially multiple architecture definitions for each of the three approaches. You guessed it again: more postings to come! I'm getting tired, but I hope you're not!!</description>
      <enclosure url="http://www.dcervo.com/blog/Home/Entries/2009/7/7_Setting_the_Stage_for_MDM__some_definitions_files/MDM_fig2_Approaches.jpg" length="47006" type="image/jpeg"/>
    </item>
    <item>
      <title>A Quadrant for Data Quality Initiatives</title>
      <link>http://www.dcervo.com/blog/Home/Entries/2009/6/23_A_Quadrant_for_Data_Quality_Initiatives.html</link>
      <guid isPermaLink="false">c0f1b9de-f890-47b1-a86c-37000cf9609c</guid>
      <pubDate>Tue, 23 Jun 2009 18:09:43 -0600</pubDate>
      <description>&lt;a href=&quot;http://www.dcervo.com/blog/Home/Entries/2009/6/23_A_Quadrant_for_Data_Quality_Initiatives_files/DQ_quad_drawing.jpg&quot;&gt;&lt;img src=&quot;http://www.dcervo.com/blog/Home/Media/object053.jpg&quot; style=&quot;float:left; padding-right:10px; padding-bottom:10px; width:182px; height:122px;&quot;/&gt;&lt;/a&gt;Most of us in Master Data Management (MDM) are very familiar with the Dimensions of Data Quality, which are very well explained by David Loshin in his Master Data Management book, or by Thomas Ravn and Martin Høedholt in their very informative article in the Information Management Magazine (see References at the end of this posting for more details).&lt;br/&gt;&lt;br/&gt;From the sources above and others, the data quality dimensions normally include, but are not limited to, the following: Accuracy, Completeness, Consistency, Currency, Referential Integrity, Timeliness, Uniqueness, and Validity. Please refer to the references for definitions, or add a comment if you would like me to expand some more on the subject.&lt;br/&gt;&lt;br/&gt;I am a big proponent of Data Quality Dimensions, and have used them extensively to help organize and classify the multitude of metrics I have implemented so far. But I also like to organize my data quality initiatives using a complementary view. I like to call it the Quadrant for Data Quality Initiatives.&lt;br/&gt;&lt;br/&gt;Data Quality initiatives normally fall into 2 categories: pro-active or reactive. In general terms, pro-active initiatives are measures you establish to avoid problems from happening, while reactive initiatives are measures you adopt after the problem has already occurred and needs correction.&lt;br/&gt;&lt;br/&gt;Either of those initiatives can lead to 2 results. One of the results is of Full Compliance, meaning the entire problematic data set is corrected, and risk of bad quality data left is near zero. The second result is of Partial Compliance, where there is no guarantee the problem is fully fixed.&lt;br/&gt;&lt;br/&gt;A classic example of a Pro-active/Full Compliance activity is when we establish referential integrity rules to avoid incorrect data from being added to the system. In my opinion, that is the ideal scenario. If we could only establish those types of rules for every data element, our life would be much easier. Unfortunately, that's not the case. We can't possibly prevent all data errors from happening.&lt;br/&gt;&lt;br/&gt;That's when the Quadrant comes handy. You can define your data quality initiatives in the terms describe above, and place them as appropriate in the quadrant. The next diagram shows a possible classification:</description>
      <enclosure url="http://www.dcervo.com/blog/Home/Entries/2009/6/23_A_Quadrant_for_Data_Quality_Initiatives_files/DQ_quad_drawing.jpg" length="107705" type="image/jpeg"/>
    </item>
  </channel>
</rss>

