ROMS is software that models and simulates an ocean region using a finite difference grid and time stepping. ROMS simulations can take from hours to days to complete due to the compute-intensive nature of the software. As a result, the size and resolution of simulations are constrained by the performance limitations of modern computing hardware. To address these issues, the existing ROMS code can be run in parallel with either OpenMP or MPI. In this work, we implement a new parallelization of ROMS on a graphics processing unit (GPU) using CUDA Fortran. We exploit the massive parallelism offered by modern GPUs to gain a performance benefit at a lower cost and with less power. To test our implementation, we benchmark with idealistic marine conditions as well as real data collected from coastal waters near central California. Our implementation yields a speedup of up to 8x over a serial implementation and 2.5x over an OpenMP implementation, while demonstrating comparable performance to a MPI implementation.
Available at: http://works.bepress.com/pchobote/5/