In this paper we present research in progress that has the aim of developing a set of data quality metrics for two aspects of the dimension of consistency, the semantic and representational aspects. In the literature metrics for these two aspects are relatively unexplored, especially in comparison with the data integrity aspect. Our goal is to apply these data quality metrics to interconnected structured and unstructured data. Because of the prevalence of unstructured data in organizations today, many strive for “content convergence” by interconnecting structured and unstructured data. The literature offers few data quality metrics for this type of data, despite the growing recognition of its potential value. We are developing our metrics in the context of data mining, and evaluating their utility using data mining outcomes in an economic context. If our metric development is successful, a well-defined economic utility function for data quality metrics can be of direct use to managers making decisions.
Available at: http://works.bepress.com/rogerblake/3/