ImageCLEF 2021 Best of Labs: The Curious Case of Caption Generation for Medical Images

No Thumbnail Available
File version
Author(s)
Nicolson, Aaron
Dowling, Jason
Koopman, Bevan
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)

Barron-Cedeno, A

DaSanMartino, G

Esposti, MD

Sebastiani, F

Macdonald, C

Pasi, G

Hanbury, A

Potthast, M

Faggioli, G

Ferro, N

Date
2022
Size
File type(s)
Location

Bologna, Italy

License
Abstract

As part of Best of Labs, we have been invited to conduct further investigation on the ImageCLEFmed Caption task of 2021. The task required participants to automatically compose coherent captions for a set of medical images. The most popular means of doing this is with an encoder-to-decoder model. In this work, we investigate a set of choices with regards to aspects of an encoder-to-decoder model. Such choices include what pre-training data should be used, what architecture should be used for the encoder, whether a natural language understanding (e.g., BERT) or generation (e.g., GPT2) checkpoint should be used to initialise the parameters of the decoder, and what formatting should be applied to the ground truth captions during training. For each of these choices, we first made assumptions about what should be used for each choice and why. Our empirical evaluation then either proved or disproved these assumptions—with the aim to inform others in the field. Our most important finding was that the formatting applied to the ground truth captions of the training set had the greatest impact on the scores of the task’s official metric. In addition, we discuss a number of inconsistencies in the results that others may experience when developing a medical image captioning system.

Journal Title
Conference Title

CLEF 2022: Experimental IR Meets Multilinguality, Multimodality, and Interaction

Book Title
Edition
Volume

13390

Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Biomedical imaging

Computer Science

Computer Science, Artificial Intelligence

Computer Science, Software Engineering

Encoder-to-decoder

Medical image captioning

Persistent link to this record
Citation

Nicolson, A; Dowling, J; Koopman, B, ImageCLEF 2021 Best of Labs: The Curious Case of Caption Generation for Medical Images, CLEF 2022: Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2022, 13390, pp. 190-203