<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
<title>Christopher Lupo</title>
<copyright>Copyright (c) 2012  All rights reserved.</copyright>
<link>http://works.bepress.com/clupo</link>
<description>Recent documents in Christopher Lupo</description>
<language>en-us</language>
<lastBuildDate>Sun, 22 Jan 2012 01:30:47 PST</lastBuildDate>
<ttl>3600</ttl>


	
		
	







<item>
<title>Numerical Ocean Modeling and Simulation with CUDA</title>
<link>http://works.bepress.com/clupo/2</link>
<guid isPermaLink="true">http://works.bepress.com/clupo/2</guid>
<pubDate>Fri, 20 Jan 2012 11:23:34 PST</pubDate>
<description>
	<![CDATA[
	<p>ROMS is software that models and simulates an ocean region using a finite difference grid and time stepping. ROMS simulations can take from hours to days to complete due to the compute-intensive nature of the software. As a result, the size and resolution of simulations are constrained by the performance limitations of modern computing hardware. To address these issues, the existing ROMS code can be run in parallel with either OpenMP or MPI. In this work, we implement a new parallelization of ROMS on a graphics processing unit (GPU) using CUDA Fortran. We exploit the massive parallelism offered by modern GPUs to gain a performance benefit at a lower cost and with less power. To test our implementation, we benchmark with idealistic marine conditions as well as real data collected from coastal waters near central California. Our implementation yields a speedup of up to 8x over a serial implementation and 2.5x over an OpenMP implementation, while demonstrating comparable performance to a MPI implementation.</p>

	]]>
</description>

<author>Jason Mak et al.</author>


<category>Articles</category>

</item>






<item>
<title>Post Register Allocation Spill Code Optimization</title>
<link>http://works.bepress.com/clupo/1</link>
<guid isPermaLink="true">http://works.bepress.com/clupo/1</guid>
<pubDate>Fri, 24 Oct 2008 13:31:51 PDT</pubDate>
<description>
	<![CDATA[
	<p>A highly optimized register allocator should provide an efficient placement of save/restore code for procedures that contain calls. This paper presents a new approach to placing callee-saved save and restore instructions that generalizes Chow's shrink-wrapping technique (Chow 1988). An efficient, profile-guided, hierarchical spill code placement algorithm is used to analyze the structure of a procedure to calculate the minimum dynamic execution count locations to place callee-saved save and restore code. The algorithm is implemented in the Gnu Compiler Collection and has been tested on the SPEC CPU2000 Integer Benchmark suite. Results show that the technique reduces the number of dynamic load and store instructions by 15% compared to saving and restoring at procedure entry and exit while Chow's shrink-wrapping technique reduces dynamic load and store instructions by only 1% compared to saving and restoring at procedure entry and exit. The dynamic number of calleesaved save and restore instructions inserted with this new approach is never greater than the number produced by Chow's shrink-wrapping technique or the placement at procedure entry and exit.</p>

	]]>
</description>

<author>Christopher Lupo et al.</author>


<category>Articles</category>

</item>





</channel>
</rss>

