Skip to main content
Exploration of statistical dependence between illness parameters using the entropy correlation coefficient
  • R. C. Craddock
  • R. Taylor
  • Gordon Broderick, Nova Southeastern University
  • T. Whistler
  • Nancy G. Klimas, Nova Southeastern University
Publication Date / Copyright Date
The entropy correlation coefficient (ECC) is a useful tool for measuring statistical dependence between variables. We employed this tool to search for pairs of variables that correlated in the chronic fatigue syndrome (CFS) Computational Challenge dataset. Highly related variables are candidates for data reduction, and novel relationships could lead to hypotheses regarding the pathogenesis of CFS. METHODS: Data for 130 female participants in the Wichita (KS, USA) clinical study [1] was coded into numerical values. Metric data was grouped using Gaussian mixture models; the number of groups was chosen using Bayesian information content. The pair-wise correlation between all variables was computed using the ECC. Significance was estimated from 1000 iterations of a permutation test and a threshold of 0.01 was used to identify significantly correlated variables. RESULTS: The five dimensions of multidimensional fatigue inventory (MFI) were all highly correlated with each other. Seven Short Form (SF)-36 measures, four CFS case-defining symptoms and the Zung self-rating depression scale all correlated with all MFI dimensions. No physiological variables correlate with more than one MFI dimension. MFI, SF-36, CDC symptom inventory, the Zung self-rating depression scale and three Cambridge Neuropsychological Test Automated Battery (CANTAB) measures are highly correlated with CFS disease status. ISCUSSION: Correlations between the five dimensions of MFI are expected since they are measured from the same instrument. The relationship between MFI and Zung depression index has been previously reported. MFI, SF-36, and Centers for Disease Control and Prevention (CDC) symptom inventory are used to classify CFS; it is not surprising that they are correlated with disease status. Only one of the three CANTAB measures that correlate with disease status has been previously found, indicating the ECC identifies relationships not found with other statistical tools. CONCLUSION: The ECC is a useful tool for measuring statistical dependence between variables in clinical and laboratory datasets. The ECC needs to be further studied to gain a better understanding of its meaning for clinical data.
Citation Information
R. C. Craddock, R. Taylor, Gordon Broderick, T. Whistler, et al.. "Exploration of statistical dependence between illness parameters using the entropy correlation coefficient" Pharmacogenomics Vol. 7 Iss. 3 (2006) p. 421 - 428
Available at: