Skip to main content
Article
SipMask: spatial information preservation for fast image and video instance segmentation
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
  • Jiale Cao, Tianjin University
  • Rao Muhammad Anwer, Mohamed Bin Zayed University of Artificial Intelligence & Inception Institute of Artificial Intelligence
  • Hisham Cholakkal, Mohamed Bin Zayed University of Artificial Intelligence & Inception Institute of Artificial Intelligence
  • Fahad Shahbaz Khan, Mohamed Bin Zayed University of Artificial Intelligence & Inception Institute of Artificial Intelligence
  • Yanwei Pang, Tianjin University
  • Ling Shao, Mohamed Bin Zayed University of Artificial Intelligence & Inception Institute of Artificial Intelligence
Document Type
Conference Proceeding
Abstract

Single-stage instance segmentation approaches have recently gained popularity due to their speed and simplicity, but are still lagging behind in accuracy, compared to two-stage methods. We propose a fast single-stage instance segmentation method, called SipMask, that preserves instance-specific spatial information by separating mask prediction of an instance to different sub-regions of a detected bounding-box. Our main contribution is a novel light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for each sub-region within a bounding-box, leading to improved mask predictions. It also enables accurate delineation of spatially adjacent instances. Further, we introduce a mask alignment weighting loss and a feature alignment scheme to better correlate mask prediction with object detection. On COCO test-dev, our SipMask outperforms the existing single-stage methods. Compared to the state-of-the-art single-stage TensorMask, SipMask obtains an absolute gain of 1.0% (mask AP), while providing a four-fold speedup. In terms of real-time capabilities, SipMask outperforms YOLACT with an absolute gain of 3.0% (mask AP) under similar settings, while operating at comparable speed on a Titan Xp. We also evaluate our SipMask for real-time video instance segmentation, achieving promising results on YouTube-VIS dataset. The source code is available at https://github.com/JialeCao001/SipMask.

DOI
10.1007/978-3-030-58568-6_1
Publication Date
11-13-2020
Keywords
  • Instance segmentation,
  • Real-time,
  • Spatial preservation
Comments

IR Deposit conditions:

  • OA version (pathway a)
  • Accepted version
  • 12 month embargo
  • Must link to published article
  • Set statement to accompany deposit
Citation Information
J. Cao, R. Anwer, H. Cholakkal, F. Khan, Y. Pang and L. Shao, "SipMask: spatial information preservation for fast image and video instance segmentation", in Computer Vision – ECCV 2020. ECCV 2020, (Lecture Notes in Computer Science, v. 12359), pp. 1-18, 2020. Available: 10.1007/978-3-030-58568-6_1