Missing Data And What To Do About It
|
Tom Breur
January 2010 Introduction Missing data are a fact of life. And sometimes they cause serious disruption. Fortunately, there are often quite acceptable ways of dealing with missings. The key lies in accurately discerning different flavors of “missing” that may exist, and dealing with each kind accordingly. In this paper we will present a simple framework to deal with missings. We propose a simple 2 ´ 2 matrix. On one axis we have voluntary versus involuntary registration of data. This typically equates to, for instance, survey data versus electronic recording of activity as in a database (or data warehouse, DWH). On the other axis we have rightfully or wrongful missing. Is a field “allowed” to be missing? Based on these two criteria you decide how to deal with missings. Because missing data can be so problematic to deal with, you’d really like to avoid this problem altogether. How can we do that? And even if some missings remain inevitable, how can we mitigate disruptions to analysis and reporting? |


