"Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels" by Guin R. Gilman

Selected Works of Sam Ogden

Follow Contact

Article

Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels

ACM SIGMETRICS Performance Evaluation Review (2020)

Guin R. Gilman, Worcester Polytechnic Institute
Samuel S. Ogden, Worcester Polytechnic Institute
Tian Guo, Worcester Polytechnic Institute
Robert J. Walls, Worcester Polytechnic Institute

Link

Abstract

In this work, we empirically derive the scheduler’s behavior under concurrent workloads for NVIDIA’s Pascal, Volta, and Turing microarchitectures. In contrast to past studies that suggest the scheduler uses a round-robin policy to assign thread blocks to streaming multiprocessors (SMs), we instead find that the scheduler chooses the next SM based on the SM’s local resource availability. We show how this scheduling policy can lead to significant, and seemingly counter-intuitive, performance degradation; for example, a decrease of one thread per block resulted in a 3.58X increase in execution time for one kernel in our experiments. We hope that our work will be useful for improving the accuracy of GPU simulators and aid in the development of novel scheduling algorithms.

Keywords

Concurrent kernels,
GPGPUs,
scheduling algorithms

Disciplines

Computer Sciences

Publication Date

2020

DOI

10.1145/3453953.3453972

Citation Information

Guin R. Gilman, Samuel S. Ogden, Tian Guo and Robert J. Walls. "Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels" ACM SIGMETRICS Performance Evaluation Review Vol. 48 Iss. 3 (2020) p. 81 - 88
Available at: http://works.bepress.com/sam-ogden/3/