What to do with bad data?
|
Tom Breur
September 2009 Introduction Everybody hates poor quality data. But let’s face it, in most organizations you will run into data quality issues, at least from time to time. So instead of arguing against poor data quality, a more useful question is: how do youdeal with it? To make business intelligence (BI) applications add value to the corporation, some level of data integration invariably takes place. This might be a data warehouse (DWH), operational data store (ODS), enterprise resource planning (ERP), customer relationship management (CRM) application or another application you might have. We’ll assume that the data quality problems originate in upstream (primary) systems that are generating your source data. In this article, I’ll focus exclusively on DWH solutions. There are two fundamentally different ways of dealing with bad data: either you load all of the data “as is” (and deal with the errors later) or you clean/scrub the data on the way in to the DWH. The former is the approach advocated by Data Vault architects, the latter I will label the “Ralph Kimball” approach, in honor of his extensive writing on this subject. This article focuses on the pros and cons of both approaches. |


