Skip to main content
Article
Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth
Quantitative Health Sciences Publications and Presentations
  • Zhaoyang Zhang, University of Massachusetts Medical School
  • Hua (Julia) Fang, University of Massachusetts Medical School
  • Honggang Wang, University of Massachusetts - Dartmouth
UMMS Affiliation
Department of Quantitative Health Sciences
Publication Date
6-1-2016
Document Type
Article Postprint
Abstract
Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services.
Keywords
  • UMCCTS funding,
  • Big data,
  • Fuzzy clustering,
  • Longitudinal trial,
  • Missing data,
  • Multiple imputation,
  • Validation
Rights and Permissions
Posted with publisher's permission.
DOI of Published Version
10.1007/s10916-016-0499-0
Source
J Med Syst. 2016 Jun;40(6):146. doi: 10.1007/s10916-016-0499-0. First published online 2016 Apr 28. The final publication is available at Springer via http://dx.doi.org/10.1007/s10916-016-0499-0
PubMed ID
27126063
Comments
This is the authors' final, peer-reviewed version of the article as prepared for publication in: J Med Syst. 2016 Jun;40(6):146. doi: 10.1007/s10916-016-0499-0. First published online 2016 Apr 28. The final publication is available at Springer via http://dx.doi.org/10.1007/s10916-016-0499-0.
Related Resources
Link to article in PubMed
Citation Information
Zhaoyang Zhang, Hua (Julia) Fang and Honggang Wang. "Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth" Vol. 40 Iss. 6 (2016) ISSN: 1573-689X
Available at: http://works.bepress.com/hua_fang/37/