Skip to main content
Article
MED-GPVS: A Deep Learning-Based Joint Biomedical Image Classification and Visual Question Answering System for Precision e-Health
IEEE International Conference on Communications
  • Harishma T. Haridas, Lakehead University, Department of Computer Science, Thunder Bay, ON, Canada
  • Mostafa M. Fouda, Idaho State University, Department of Electrical and Computer Engineering, Pocatello, ID, United States
  • Zubair Md Fadlullah, Lakehead University, Department of Computer Science, Thunder Bay, ON, Canada & Thunder Bay Regional Health Research Institute (TBRHRI), Thunder Bay, ON, Canada
  • Mohamed Mahmoud, Tennessee Tech University, Department of Electrical and Computer Engineering, Cookeville, TN, United States
  • Benha University, Faculty of Engineering at Shoubra, Egypt, Benha University, Faculty of Engineering at Shoubra, Egypt
  • Mohsen Guizani, Mohamed bin Zayed University of Artificial Intelligence
Document Type
Conference Proceeding
Abstract

General Purpose Vision System (GPVS) is a task-agnostic vision-language system that inputs an image and a question from which the system recognizes the tasks to be performed and outputs bounding boxes, confidence scores, and text outputs to answer the question. While much attention to GPVS has been recently given in the computer vision field, its medical field applications are still in their infancy. This paper presents MED-GPVS, a customized deep learning-based GPVS on biomedical images to perform various vision tasks, such as object detection and visual question answering, on medical images to facilitate precision medicine/e-health services. Our envisioned MED-GPVS takes an image and a natural language text as inputs, and then outputs bounding boxes, confidence scores, and generates a caption (i.e., the answer to the posed query). For example, if a medical image of a patient's abdomen is presented to MED-GPVS followed by the question: "does the picture contain stomach?", MED-GPVS should ideally provide the answer "yes"along with a prediction box and prediction score on the image. We utilize the multilingual SLAKE dataset, which was annotated by expert physicians with a full semantic label, to validate the performance of MED-GPVS under various scenarios involving different biomedical image-based diagnoses. For the visual question answering (VQA) task, MED-GPVS demonstrates encouraging performance with significantly high accuracy of 82.41%. © 2022 IEEE.

DOI
10.1109/ICC45855.2022.9839076
Publication Date
8-11-2022
Keywords
  • Detection Transformer (DETR),
  • e-health,
  • General Purpose Vision System (GPVS),
  • Natural Language Processing (NLP),
  • object detection,
  • precision medicine,
  • Vision-and-Language Bidirectional Encoder Representations from Transformers (ViLBERT),
  • Character recognition,
  • Computer vision,
  • Deep learning,
  • Diagnosis,
  • Image classification,
  • Medical imaging,
  • Natural language processing systems,
  • Object recognition,
  • Query processing,
  • Semantics
Comments

IR Deposit conditions: non-described

Citation Information
H. T. Haridas, M. M. Fouda, Z. M. Fadlullah, M. Mahmoud, B. M. ElHalawany and M. Guizani, "MED-GPVS: A Deep Learning-Based Joint Biomedical Image Classification and Visual Question Answering System for Precision e-Health," in ICC 2022 - IEEE International Conference on Communications, 2022, pp. 3838-3843, doi: 10.1109/ICC45855.2022.9839076.