Skip to main content
Article
A software framework for data dimensionality reduction: application to chemical crystallography
Integrating Materials and Manufacturing Innovation
  • Sai Kiranmayee Samudrala, Georgia Institute of Technology
  • Prasanna Venkataraman Balachandran, Drexel University
  • Jaroslaw Zola, Rutgers University - New Brunswick/Piscataway
  • Krishna Rajan, Iowa State University
  • Baskar Ganapathysubramanian, Iowa State University
Document Type
Article
Publication Version
Published Version
Publication Date
1-1-2014
DOI
10.1186/s40192-014-0017-5
Abstract
Materials science research has witnessed an increasing use of data mining techniques in establishing process‐structure‐property relationships. Significant advances in high‐throughput experiments and computational capability have resulted in the generation of huge amounts of data. Various statistical methods are currently employed to reduce the noise, redundancy, and the dimensionality of the data to make analysis more tractable. Popular methods for reduction (like principal component analysis) assume a linear relationship between the input and output variables. Recent developments in non‐linear reduction (neural networks, self‐organizing maps), though successful, have computational issues associated with convergence and scalability. Another significant barrier to use dimensionality reduction techniques in materials science is the lack of ease of use owing to their complex mathematical formulations. This paper reviews various spectral‐based techniques that efficiently unravel linear and non‐linear structures in the data which can subsequently be used to tractably investigate process‐structure‐property relationships. In addition, we describe techniques (based on graph‐theoretic analysis) to estimate the optimal dimensionality of the low‐dimensional parametric representation. We show how these techniques can be packaged into a modular, computationally scalable software framework with a graphical user interface ‐ Scalable Extensible Toolkit for Dimensionality Reduction (SETDiR). This interface helps to separate out the mathematics and computational aspects from the materials science applications, thus significantly enhancing utility to the materials science community. The applicability of this framework in constructing reduced order models of complicated materials dataset is illustrated with an example dataset of apatites described in structural descriptor space. Cluster analysis of the low‐dimensional plots yielded interesting insights into the correlation between several structural descriptors like ionic radius and covalence with characteristic properties like apatite stability. This information is crucial as it can promote the use of apatite materials as a potential host system for immobilizing toxic elements.
Comments

This article is published as Samudrala, Sai Kiranmayee, Prasanna Venkataraman Balachandran, Jaroslaw Zola, Krishna Rajan, and Baskar Ganapathysubramanian. "A software framework for data dimensionality reduction: application to chemical crystallography." Integrating Materials and Manufacturing Innovation 3, no. 1 (2014): 1-20. DOI: 10.1186/s40192-014-0017-5. Posted with permission.

Creative Commons License
Creative Commons Attribution 4.0
Copyright Owner
The Authors
Language
en
File Format
application/pdf
Citation Information
Sai Kiranmayee Samudrala, Prasanna Venkataraman Balachandran, Jaroslaw Zola, Krishna Rajan, et al.. "A software framework for data dimensionality reduction: application to chemical crystallography" Integrating Materials and Manufacturing Innovation Vol. 3 Iss. 17 (2014) p. 1 - 20
Available at: http://works.bepress.com/baskar-ganapathysubramanian/22/