Skip to main content
Article
Semi-Supervised Cross-Modal Salient Object Detection with U-Structure Networks
arXiv
  • Yunqing Bao, Mohamed bin Zayed University of Artificial Intelligence
  • Hang Dai, Mohamed bin Zayed University of Artificial Intelligence
  • Abdulmotaleb Elsaddik, Mohamed bin Zayed University of Artificial Intelligence
Document Type
Article
Abstract

Salient Object Detection (SOD) is a popular and important topic aimed at precise detection and segmentation of the interesting regions in the images. We integrate the linguistic information into the vision-based U-Structure networks designed for salient object detection tasks. The experiments are based on the newly created DUTS Cross Modal (DUTS-CM) dataset, which contains both visual and linguistic labels. We propose a new module called efficient Cross-Modal Self-Attention (eCMSA) to combine visual and linguistic features and improve the performance of the original U-structure networks. Meanwhile, to reduce the heavy burden of labeling, we employ a semi-supervised learning method by training an image caption model based on the DUTS-CM dataset, which can automatically label other datasets like DUT-OMRON and HKU-IS. The comprehensive experiments show that the performance of SOD can be improved with the natural language input and is competitive compared with other SOD methods. Copyright © 2022, The Authors. All rights reserved.

DOI
10.48550/arXiv.2208.04361
Publication Date
8-8-2022
Keywords
  • Image segmentation,
  • Learning systems,
  • Linguistics,
  • Object detection,
  • Supervised learning
Comments

IR Deposit conditions: non-described

Preprint available on arXiv

Citation Information
Y. Bao, H. Dai, and A. Elsaddik, "Semi-Supervised Cross-Modal Salient Object Detection with U-Structure Networks", 2022, arXiv:2208.04361