ImageCLEF 2021 Best of Labs: The Curious Case of Caption Generation for Medical Images
File version
Author(s)
Dowling, Jason
Koopman, Bevan
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Barron-Cedeno, A
DaSanMartino, G
Esposti, MD
Sebastiani, F
Macdonald, C
Pasi, G
Hanbury, A
Potthast, M
Faggioli, G
Ferro, N
Date
Size
File type(s)
Location
Bologna, Italy
License
Abstract
As part of Best of Labs, we have been invited to conduct further investigation on the ImageCLEFmed Caption task of 2021. The task required participants to automatically compose coherent captions for a set of medical images. The most popular means of doing this is with an encoder-to-decoder model. In this work, we investigate a set of choices with regards to aspects of an encoder-to-decoder model. Such choices include what pre-training data should be used, what architecture should be used for the encoder, whether a natural language understanding (e.g., BERT) or generation (e.g., GPT2) checkpoint should be used to initialise the parameters of the decoder, and what formatting should be applied to the ground truth captions during training. For each of these choices, we first made assumptions about what should be used for each choice and why. Our empirical evaluation then either proved or disproved these assumptions—with the aim to inform others in the field. Our most important finding was that the formatting applied to the ground truth captions of the training set had the greatest impact on the scores of the task’s official metric. In addition, we discuss a number of inconsistencies in the results that others may experience when developing a medical image captioning system.
Journal Title
Conference Title
CLEF 2022: Experimental IR Meets Multilinguality, Multimodality, and Interaction
Book Title
Edition
Volume
13390
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Biomedical imaging
Computer Science
Computer Science, Artificial Intelligence
Computer Science, Software Engineering
Encoder-to-decoder
Medical image captioning
Persistent link to this record
Citation
Nicolson, A; Dowling, J; Koopman, B, ImageCLEF 2021 Best of Labs: The Curious Case of Caption Generation for Medical Images, CLEF 2022: Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2022, 13390, pp. 190-203