Contribution to Book
A Study on Balancing Parallelism, Data Locality, and Recomputation in Existing PDE SolversProceedings of SC14: The International Conference for High Performance Computing, Networking, Storage and Analysis (2014)
Structured-grid PDE solver frameworks parallelize over boxes, which are rectangular domains of cells or faces in a structured grid. In the Chombo framework, the box sizes are typically 16<sup>3</sup> or 32<sup>3</sup>, but larger box sizes such as 128<sup>3</sup> would result in less surface area and therefore less storage, copying, and/or ghost cells communication overhead. Unfortunately, current onnode parallelization schemes perform poorly for these larger box sizes. In this paper, we investigate 30 different inter-loop optimization strategies and demonstrate the parallel scaling advantages of some of these variants on NUMA multicore nodes. Shifted, fused, and communication-avoiding variants for 128<sup>3</sup> boxes result in close to ideal parallel scaling and come close to matching the performance of 16<sup>3</sup> boxes on three different multicore systems for a benchmark that is a proxy for program idioms found in Computational Fluid Dynamic (CFD) codes.
- parallel processing,
- computational fluid dynamics,
- multicore processing,
Citation InformationCatherine Olschanowsky, Michelle Mills Strout, Stephen Guzik, John Loffeld, et al.. "A Study on Balancing Parallelism, Data Locality, and Recomputation in Existing PDE Solvers" Piscataway, NJProceedings of SC14: The International Conference for High Performance Computing, Networking, Storage and Analysis (2014) p. 793 - 804
Available at: http://works.bepress.com/catherine-olschanowsky/9/