Data augmentation using synthetic lesions improves machine learning detection of microbleeds from MRI
File version
Author(s)
Fazllolahi, Amir
Bourgeat, Pierrick
Raniga, Parnesh
Yates, Paul
Yassi, Nawaf
Desmond, Patricia
Fripp, Jurgen
Gao, Yongsheng
Salvado, Olivier
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Gooya, A
Goksel, O
Oguz, I
Burgos, N
Date
Size
File type(s)
Location
Granada, Spain
License
Abstract
Machine learning applied to medical imaging for lesions detection, such as cerebral microbleeds (CMB) from Magnetic Resonance Imaging (MRI), is challenged by the relatively small datasets available for which only subjective and tedious visual reading is available, and by the low prevalence of lesions (a few in ~10% of a typical elderly cohort) resulting in unbalanced classes. Moreover, the lack of actual ground truth might limit the performance of any machine learning method to that of human performance. Yet, the automatic identification of those lesions is relevant to quantify cerebrovascular burden associated with dementia, such as identifying co-morbidity for Alzheimer’s disease. In this paper, we investigated a novel approach consisting of simulating synthetic CMB on SWI MRI scans from healthy individuals to create a large and well characterized training dataset, as a data augmentation strategy. Firstly, we characterized actual CMBs from MRI SWI scans and designed a method to create realistic synthetic CMBs whose location, shape, appearance, and size are similar to actual CMBs. We then tested a supervised neural network classifier using various combinations of actual CMB and synthetic CMBs for training. Augmenting data with synthetic CMBs resulted in a large improvement over training on only actual CMBs only when tested on unseen lesions, and provided better results than other standard data augmentation approaches. Our results suggest that data augmentation using synthetic lesions can address the lack of ground truth and low prevalence limitations for medical imaging analysis allowing the deployment of data hungry supervised learning techniques such as deep learning.
Journal Title
Conference Title
Lecture Notes in Computer Science
Book Title
Edition
Volume
11037
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Medical devices