In this paper we compare the differences between principal components analysis, hierarchical clustering, correspondence analysis and conceptual clustering to show their effectiveness for identifying patterns in a large limnological data set. The data for this comparison come from a multi-year study of Lake Whatcom, a large lake located in the Puget Sound lowlands of the state of Washington. The data include both physical and chemical parameters (temperature, dissolved oxygen, pH, alkalinity, turbidity, conductivity, and nutrients) as well as biological parameters (Secchi depth, chlorophyll a, and phytoplankton species and total counts). The patterns we expected to find include (a) temperature and dissolved oxygen interactions, (b) ordination by algal bloom sequences, and (c) clustering due to the effects of stratification.
Principal components analysis was somewhat useful for confirming known water quality trends, but did not successfully identify large-scale patterns such as stratification and seasonal plankton changes. Correspondence analysis proved to be superior to principal components analysis for detecting phytoplankton trends, but was not as good for interpreting water quality changes. Hierarchical clustering produced highly unbalanced trees for both the water quality and phytoplankton data, and was useless as an exploratory tool. A new approach to clustering, implemented in the computer program riffle, is introduced here. This clustering algorithm outperformed the other exploratory tools in clustering and parameter ordination, and successfully identified a number of expected and unexpected patterns in the limnological data.
Available at: http://works.bepress.com/robin_matthews/10/