"Data Quality Control – building a novelty detector"
Tom Breur
May 2009 Introduction When a Data Warehouse (DWH) goes in production mode, the initial Rather than waiting for end-users to question the content of your DWH “the hard way”, we advocate a pro-active stance. Our recommendation
is to constantly monitor data quality. Not only compliance with
formatting, adherence to delivery processes, and referential integrity,
but also data content. This way you can truly remain “in control” of
your data. In this paper we describe how to design test programs along with your ETL that provide monitoring of data content quality. This application that assesses whether new data are in line with historical feeds was previously labeled a “novelty detector” (Pyle, 1999), a term we will adopt here too. |


