Data augmentation using synthetic lesions improves machine learning detection of microbleeds from MRI

No Thumbnail Available
File version
Author(s)
Momeni, Saba
Fazllolahi, Amir
Bourgeat, Pierrick
Raniga, Parnesh
Yates, Paul
Yassi, Nawaf
Desmond, Patricia
Fripp, Jurgen
Gao, Yongsheng
Salvado, Olivier
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)

Gooya, A

Goksel, O

Oguz, I

Burgos, N

Date
2018
Size
File type(s)
Location

Granada, Spain

License
Abstract

Machine learning applied to medical imaging for lesions detection, such as cerebral microbleeds (CMB) from Magnetic Resonance Imaging (MRI), is challenged by the relatively small datasets available for which only subjective and tedious visual reading is available, and by the low prevalence of lesions (a few in ~10% of a typical elderly cohort) resulting in unbalanced classes. Moreover, the lack of actual ground truth might limit the performance of any machine learning method to that of human performance. Yet, the automatic identification of those lesions is relevant to quantify cerebrovascular burden associated with dementia, such as identifying co-morbidity for Alzheimer’s disease. In this paper, we investigated a novel approach consisting of simulating synthetic CMB on SWI MRI scans from healthy individuals to create a large and well characterized training dataset, as a data augmentation strategy. Firstly, we characterized actual CMBs from MRI SWI scans and designed a method to create realistic synthetic CMBs whose location, shape, appearance, and size are similar to actual CMBs. We then tested a supervised neural network classifier using various combinations of actual CMB and synthetic CMBs for training. Augmenting data with synthetic CMBs resulted in a large improvement over training on only actual CMBs only when tested on unseen lesions, and provided better results than other standard data augmentation approaches. Our results suggest that data augmentation using synthetic lesions can address the lack of ground truth and low prevalence limitations for medical imaging analysis allowing the deployment of data hungry supervised learning techniques such as deep learning.

Journal Title
Conference Title

Lecture Notes in Computer Science

Book Title
Edition
Volume

11037

Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Medical devices

Persistent link to this record
Citation