"Many Models at the Edge: Scaling Deep Inference via Model-Level Caching" by Samuel S. Ogden

Selected Works of Sam Ogden

Follow Contact

Article

Many Models at the Edge: Scaling Deep Inference via Model-Level Caching

2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS) (2021)

Samuel S. Ogden, Worcester Polytechnic Institute
Guin R. Gilman, Worcester Polytechnic Institute
Robert J. Walls, Worcester Polytechnic Institute
Tian Guo, Worcester Polytechnic Institute

Link

Abstract
Deep learning (DL) models are rapidly expanding in popularity in large part due to rapid innovations in model accuracy, as well as companies’ enthusiasm in integrating deep learning into the existing application logic. This trend will inevitably lead to a deployment scenario, akin to the content delivery network for web objects, where many deep learning models— each with different popularity—run on a shared edge with limited resources. In this paper, we set out to answer the key question of how to manage many deep learning models at the edge effectively. Via an empirical study based on profiling more than twenty deep learning models and extrapolating from an open-source Microsoft Azure workload trace, we pinpoint a promising avenue of leveraging cheaper CPUs, rather than commonly promoted accelerators, for edge-based deep inference serving.

Based on our empirical insights, we formulate the DL model management problem as a classical caching problem, which we refer to as model-level caching. As an initial step towards realizing model-level caching, we propose a simple cache eviction policy, called CremeBrulee, by adapting BeladyMIN to explicitly consider DL model-specific factors when calculating each in-cache object’s utility. Using a small-scale testbed, we demonstrate that CremeBrulee can achieve a 50% reduction in memory while keeping load latency below 92% of execution latency and less than 36% of the penalty of using a random approach to model eviction. Further, when scaling to more models and requests in a simulation, we demonstrate that CremeBrulee can keep the model load delay lower than other eviction policies that only consider workload characteristics by up to 16.6%.

Relevant research artifacts are available at https://github.com/ cake-lab/CremeBrulee

Disciplines

Computer Sciences

Publication Date

2021

DOI

10.1109/ACSOS52086.2021.00027

Citation Information

Samuel S. Ogden, Guin R. Gilman, Robert J. Walls and Tian Guo. "Many Models at the Edge: Scaling Deep Inference via Model-Level Caching" 2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS) (2021) p. 51 - 60
Available at: http://works.bepress.com/sam-ogden/1/