Skip to main content
Article
Generation of Prediction Intervals to Assess Data Quality in the Distribute System Using Quantile Regression
2011 Joint Statistical Meetings Proceedings
  • Ian Painter, University of Washington - Seattle Campus
  • Julie Eaton, University of Washington Tacoma
  • Debra Revere, University of Washington - Seattle Campus
  • Bill Lober, University of Washington - Seattle Campus
  • Donald R. Olson, New York City Department of Health and Mental Hygiene
Publication Date
12-1-2011
Document Type
Conference Proceeding
Abstract
Distribute is a national influenza-like-illness (ILI) surveillance project that integrates data from multiple jurisdictions. Distribute works solely with summarized (aggregated) data. Timeliness of the data varies considerably between sites; for many sites data for each encounter date arrives piecemeal, spread over several days. This spread adds additional noise into the data received by the Distribute system. Systematic differences in the timeliness between sources of data can introduce bias into the indicator of interest, the ILI ratio. Quantile regression using the observed relationship between incomplete and complete data is used to calculate prediction intervals for complete data. Some sites have very narrow prediction intervals that indicate the ILI-ratio calculated from incomplete data approximates the complete data ratio very accurately. Other sites show considerable asymmetry.
Citation Information
Ian Painter, Julie Eaton, Debra Revere, Bill Lober, et al.. "Generation of Prediction Intervals to Assess Data Quality in the Distribute System Using Quantile Regression" 2011 Joint Statistical Meetings Proceedings (2011) p. 5172 - 5179
Available at: http://works.bepress.com/julie_eaton/2/