In healthcare, a tremendous amount of clinical and laboratory tests, imaging, prescription and medication data are being collected. Big data analytics on these data aim at early detection of disease which will help in developing preventive measures and in improving patient care. Parkinson disease is the second-most common neurodegenerative disorder in the United States. To find a cure for Parkinson's disease biological, clinical and behavioral data of different cohorts are collected, managed and propagated through Parkinson’s Progression Markers Initiative (PPMI). Applying big data technology to this data will lead to the identification of the potential biomarkers of Parkinson’s disease. Data collected in human clinical studies is imbalanced, heterogeneous, incongruent and sparse. This study focuses on the ways to overcome the challenges offered by PPMI data which is wide and gappy. This work leverages the initial discoveries made through descriptive studies of various attributes. The exploration of data led to identifying the significant attributes. We are further working to build a software suite that enables end to end analysis of Parkinson’s data (from cleaning and curating data, to imputation, to dimensionality reduction, to multivariate correlation and finally to identify potential biomarkers).
Available at: http://works.bepress.com/baskar-ganapathysubramanian/56/