Skip to main content
Article
Commands for autonomous vehicles by progressively stacking visual-linguistic representations
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
  • Hang Dai, Mohamed Bin Zayed University of Artificial Intelligence
  • Shujie Luo, College of Information Science and Electronic Engineering, Zhejiang University
  • Yong Ding, College of Information Science and Electronic Engineering, Zhejiang University
  • Ling Shao, Mohamed Bin Zayed University of Artificial Intelligence & Inception Institute of Artificial Intelligence
Document Type
Conference Proceeding
Abstract

In this work, we focus on the object referral problem in the autonomous driving setting. We use a stacked visual-linguistic BERT model to learn a generic visual-linguistic representation. Each element of the input is either a word or a region of interest from the input image. To train the deep model efficiently, we use a stacking algorithm to transfer knowledge from a shallow BERT model to a deep BERT model.

DOI
10.1007/978-3-030-66096-3_2
Publication Date
1-3-2021
Keywords
  • Bidirectional Encoder Representations from Transformers (BERT),
  • image classification,
  • natural language processing
Comments

IR Deposit conditions:

  • OA version (pathway a)
  • Accepted version 12 month embargo
  • Must link to published article
  • Set statement to accompany deposit
Citation Information
H. Dai, S. Luo, Y. Ding and L. Shao, "Commands for autonomous vehicles by progressively stacking visual-linguistic representations", in Computer Vision – ECCV 2020 Workshops, ECCV 2020, (Lecture Notes in Computer Science, v. 12536), pp. 27-32, 2020. Available: 10.1007/978-3-030-66096-3_2