Newsletters list:

Throughput accounting
Agile Coaching
Data Models
“Big Data”
Visual facilitation
Agile planning
Churn modeling
Writing Survey Questions
Theory of Constraints
Hands On Data Mining
Data Vault
Time boxing
Surrounding requirements
Cybercrime
Retrospectives
Self Service BI
Internet Surveys
How to build predictive models
New Accounting Standards
Technical Reviews
Text Mining
Meta Data
Open Source BI
Data Warehouse Testing
Customer Value Management
Value From Transaction Data
Data Visualization
Survey Design
Predictive Modelling
Applied Probability Theory
Open Source
Software Testing
Data Warehouse Development
Data Quality Policy
History of Mathematics
Usability Research
Life Time Value
Balanced Scorecards
Survey Sampling
Agile Software Development
ETL
Neural Networks
Corporate Strategy
Missing Data
Segmentation
Decision Trees
XBRL
OLAP
Data Quality Assessment
Dashboards and Scorecards
Data Mining for CRM
Data Mining Algorithms
Data Preparation
Campaign Optimisation
Affinity Analysis
Vendor Selection
System Dynamics
Credit Scoring
Forecasting
Web Usage Analysis
Customer Profitability
Problem Analysis
Customer Satisfaction & Loyalty
IT Governance
Market Research
Search Engines
Marketing Accountability
CRM
Data Mining Models
Privacy
Data Warehousing
Data Quality

PDF icon Print this newsletter

Tom’s Ten Data Tips – July 2010

Meta Data

Meta data are data about data. A more descriptive definition says meta data is everything you need to know in order to interpret information in its context. Examples are column headers in spreadsheets, the units of measurement, date and time you recorded a number, the definition used for “customer” or “lapsed”, but also all “technical” information required to run a database like primary or foreign key, field length/type, etc.

Business intelligence (BI) and decision support systems are expanding in many corporations. Often, the only place where business meta data is stored is in the collective consciousness of employees. This ‘tribal knowledge’ represents a vulnerability because businesses risk becoming dependent on maintaining their staff in order to function effectively. When employees resign, part of your meta data walk out the door. A central, managed meta data repository helps to escape that dynamic.

1. Meta Data Enables Change

In many corporations, change in IT systems is a painful, cumbersome, and costly process. This is often caused by an inability to plan for the change itself, and “see” the picture that arises after proposed (or necessary) changes have been put into effect. Many of our clients complain that any little IT change appears to turn into an ordeal. IT, from its part, has a hard time estimating time frames and cost for the change. The end result is unsatisfactory: a stalemate where both parties are unhappy.

Business users want a more agile IT department, and IT is numbed by ongoing critique for its “unrealistic” planning. Estimates seem excessive, yet many projects still don’t finish within budget (how pessimistic were these estimates then, really??). This lack of control is all too often caused by poor meta data (documentation), which makes any planning of work a stressful guess.

2. Business Meta Data Institutes Common Language

Business meta data are a formalization of tacit knowledge within the organization. By making it explicit, by specifying definitions in excruciating detail, you create a common language that enables business units to have more meaningful discussions (technically referred to as “data harmonization”). This provides an important first step towards better aligning business and IT.

3. Technical Meta Data Makes The Database Run

If business meta data are “customer facing”, at the front end of the meta data spectrum, then technical meta data are the “engineering”, back end. This includes everything that is required in terms of syntax to allow the database to do its work. Imagine things like naming of fields, their type and length, but also the way they should be “connected”, the so called relational schema.

4. Meta Data Assure Integrity Of Your Results

Every system tends to have its own, proprietary meta data. This could be a catalog or data dictionary for a legacy system, or the descriptor portion of a SAS file, column headers and field definitions in an Excel file, or more elaborate structures like in Oracle Designer, etc. Corporate information increasingly gets produced by combining multiple sources, say, finance and operations, together with a CRM system. Obviously, this requires consolidated meta data. If you want to look at information uniformly across business lines (e.g.: make sure that finance and marketing report the same number of customers), their respective definitions must match. Exactly. And when they each query some arcane legacy system, “technicalities” about which records to include, and which not (active, non-blocked accounts, etc.), are the cornerstone of reliable reporting.

Anyone who has accessed complex data schemes can appreciate how challenging it can be to write not only syntactically correct but also semantically valid SQL queries. How many times have you retrieved supposedly the same set of records via two different query paths, yet come up with unexplainable differences? The mathematical set logic can become pretty challenging, even more so in highly normalized data schemes. Your meta data should point you to “approved” (SQL) code to avoid all the pitfalls that complex legacy systems may hold. How else can you stand by your findings?

5. The Value Of Meta Data Becomes Even More Obvious When It Is Missing…

By far the most obvious sign of missing (business) meta data are people looking around, questioning each other to find the “context”, to identify the meaning of numbers, or learning how to interpret database reports, OLAP cubes, etc.

Abundant research on productivity of knowledge workers (see e.g. Peopleware, DeMarco & Lister) has established that interrupting a colleague will “cost” about 15 minutes of his or her productive time on top of the time needed to answer a question. So two people are seeking information (effectively being unproductive), and both loose an additional 15 minutes to “get up to speed” again. This quickly adds up to surprising costs to answer “one quick question.”

6. Automate Populating Of Your Meta Data Repository For Maintainability

Any tables in your meta data repository that get updated on a repetitive basis, should get refreshed by an automated ETL process. These could be a list of active product codes, a table with branches, geo demographics data, weather reports, etc. Although (in the short run) the same target data might be keyed in faster and cheaper, that is still an approach to avoid. That’s because keying in data quickly becomes too time consuming for the meta data team, and worse, any changes in the source will immediately make the repository outdated – a killer! If you can’t find “the truth” in the meta data repository, where should you go?? A second reason why such interfaces need to be automated, is because manual data entry is non-scalable, and impossible to maintain over time. The whole system quickly degrades to knowledge inside someone’s head, a situation you desperately want to avoid.

When the ownership of business meta data resides within departments, offer them a simple and user-friendly interface, so they can update these fields themselves (say, a product catalog, table with branches, etc.). This should then serve as a reference table for ETL routines. It is imperative that no IT help is involved whatsoever to let business users make these changes. Every hurdle is one too many.

7. Tool Integration Is A Key To Success

There have been a few attempts to standardize meta data models (notably OMG and MDC), none of which has created a true “standard.” ETL, data modeling, DWH and reporting tools all need to feed back crucial data to the meta data repository. And do this in an automated fashion (see also tip# 6). But neither content nor connectors are standardized so this remains a considerable challenge, presently.

Because so much corporate knowledge is stored in Word documents, Excel, implicitly hidden in program code, etc., these business meta data are cumbersome to access. The technical meta data, or “back end” of your meta data repository, lack a common standard across software vendors. Tool vendors have done little to create open standards. Some even suggest that recent acquisitions in the marketplace will seduce mega vendors (IBM, Oracle, SAP) to exclusively make connections easy within their own portfolio, effectively “forcing” their customers to one stop shopping, and create lock-in. But this grunt work of merging disparate meta data sources is still required. There is no straightforward solution in sight, yet.

8. Business Meta Data Enables Man-Machine Interaction

Since the introduction of computers to facilitate efficiency gains for knowledge workers, we have tried to expand its usage. As applications proliferate and more and more people work with computers, it becomes increasingly evident that we don’t communicate easily with machines. They pose rather strict demands on how we instruct them.

Meta data, and in particular business meta data, are instrumental in making the connection between machine readable instructions and what we are trying to accomplish. In years to come, further productivity gains will come from removing the barriers of communication between humans and computers.

9. A Meta Data Repository Is A Necessary But Not Sufficient Condition For Knowledge Management

Knowledge management (KM) may have seemed a fad, another case where hype appears to have been boosted by software vendors. Although the KM hype may have waned, the underlying principles and intent are sound and valid. We have moved into a knowledge economy, and for many companies IP is their most valuable asset. So actively managing that knowledge makes intuitive sense.

Unless there is a “common language” across business lines (see also tip# 2), unless definitions about key entities (when does a prospect formally become a customer, etc.) are explicit and shared, confusion will continue to reign. Don’t try to build your castle on quicksand.

10. Recording Meta Data Is Like Flossing

Although many BI professionals buy into the importance of recording meta data, somehow it bears a lot of resemblance to flossing. Your dentists tells you how important it is and we agree. We know it is good for us, and we know we should be doing it. But somehow, it doesn’t get done…

If you live by your convictions, then make these activities a priority. Reward them commensurately. And continue to do so. Because there are just too many excuses not to do it, and fall into the trap of so many vicious circles that will drive your effort into the ground. You can change this dynamic, because you can make a difference.

Further reading

Some excellent books on Meta Data:

Universal Meta Data Models.
David Marco & Michael Jennings (2004)
ISBN# 0471081779

Building and Managing the Meta Data Repository.
David Marco (2000)
ISBN# 0471355232

Business Metadata: Capturing Enterprise Knowledge.
Bill Inmon, Bonny O’Neill & Lowell Fryman (2007)
ISBN# 0123737265

Aligning Business and IT with Metadata: The Financial Services Way.
Hans Wegener (2007)
ISBN# 0470030313

Contact
XLNT Consulting
Tom Breur, Principal

E-mail
Email Tom Breur

Telephone
+31-6-463 468 75

Address
Langestraat 8-03
5038 SE Tilburg
the Netherlands