Most algorithms are run-to-completion and provide one answer upon completion and no answer if interrupted before completion. On the other hand, anytime algorithms have a monotonic increasing utility with the length of execution time. Our investigation focuses on the development of time-bounded anytime algorithms on Graphics Processing Units (GPUs) to trade-off the quality of output with execution time. Given a time-varying workload, the algorithm continually measures its progress and the remaining contract time to decide its execution pathway and select system resources required to maximize the quality of the result. To exploit the quality-time tradeoff, the focus is on the construction, instrumentation, on-line measurement and decision making of algorithms capable of efficiently managing GPU resources. We demonstrate this with a Parallel A* routing algorithm on a CUDA-enabled GPU. The algorithm execution time and resource usage is described in terms of CUDA kernels constructed at design-time. At runtime, the algorithm selects a subset of kernels and composes them to maximize the quality for the remaining contract time. We demonstrate the feedback-control between the GPU-CPU to achieve controllable computation tardiness by throttling request admissions and the processing precision. As a case study, we have implemented AutoMatrix, a GPU-based vehicle traffic simulator for real-time congestion management which scales up to 16 million vehicles on a US street map. This is an early effort to enable imprecise and approximate real-time computation on parallel architectures for stream-based timebounded applications such as traffic congestion prediction and route allocation for large transportation networks.
Available at: http://works.bepress.com/rahul_mangharam/8/