Synthetic data generation for machine learning to improve Microbleeds detection from MRI images
Author(s)
Primary Supervisor
Gao, Yongsheng
Other Supervisors
Liew, Wee-Chung
Salvado, Olivier
Year published
2022-08-08
Metadata
Show full item recordAbstract
Cerebral Microbleeds (CMB) are small chronic brain haemorrhages, known as paramagnetic blood products, and likely caused by structural abnormalities of the small vessels. The concept of CMB is primarily a radiological construct describing small MRI signal voids. They are often present with cerebrovascular disease, dementia, Alzheimer’s disease, and normal aging people. Substantial progress has been made in recent years, in developing MRI methodologies showing CMB, such as susceptibility-weighted imaging (SWI) which is sensitive to differences in tissue magnetic susceptibility. In SWI, residual paramagnetic blood products ...
View more >Cerebral Microbleeds (CMB) are small chronic brain haemorrhages, known as paramagnetic blood products, and likely caused by structural abnormalities of the small vessels. The concept of CMB is primarily a radiological construct describing small MRI signal voids. They are often present with cerebrovascular disease, dementia, Alzheimer’s disease, and normal aging people. Substantial progress has been made in recent years, in developing MRI methodologies showing CMB, such as susceptibility-weighted imaging (SWI) which is sensitive to differences in tissue magnetic susceptibility. In SWI, residual paramagnetic blood products appear with a lower intensity and are therefore suitable for detecting CMB, showing as hypointense amorphous small semi-spherical blobs. However, on visual inspection of SWI scans, CMB appearance can easily be confounded with small blood vessels cross-sections. The detection of CMB from MRI is clinically important and has generated increasing interest as an imaging marker of vessel diseases and cognitive dysfunctions. Being subjective, time-consuming, and high missing rates are some critical challenges of human rating. Computer-aided algorithms are being proposed to detect CMB for different MRI modalities with the aim of increasing sensitivity and reducing the number of false positives. However, the lack of ground truth leads to the imbalanced data problem and overfitting. To cope with the lack of dataset, traditional data augmentation methods, undersampling, cost-sensitive matrices are employed to mitigate the imbalanced data during classifier training. There are some problems behind with these applied solutions. Undersampling could omit some informative negative samples, traditional data augmentation methods are highly dependent on the diversity and availability of the ground truth, and the cost-sensitive matrix is not a general solution. Data privacy is a critical issue in medical research and it is essential to highlight that research and public health activities need to be carried out in ways that protect individuals’ privacy. Therefore, the demand to generate synthetic data is increasing. This thesis proposes a synthetic CMB generation model independent of the ground truth availability, and with the capability of being adaptable to new MRI modalities. The second aim is to improve CMB classification by using synthetic CMB in the classifier to achieve high sensitivity with low false positives. To propose our synthetic CMB generation model, we exploited our limited ground truth from SWI MRI images. Some features such as location, intensity, size and shape were extracted to get intuition from real CMB. The Gaussian mixture model was applied to distinguish background and outliers and extract a binary CMB mask to compute volume. To have information about the Microbleeds’ location, we defined our tissue mask including gray matter, white matter, ventricle, and cerebrospinal fluid to compute distance distribution and CMB proportion for each region. For shape, and intensity, the average of all real CMB and minimum intensity values was computed. After analysing the real CMB’s characteristics, we proposed our analytical model to generate synthetic CMB. The model is based on the hypothesis that the rCMB are Gaussian-like structures, elongated in z direction, and spreading all over the brain with low-intensity value. There are some benefits to using an analytical model. The model can cover the diversity of the CMB in location and shape by adding random parameters. In addition, any classifier could be trained only on the synthetic data and tested on the real limited lesions. Extensive validation and testing experiments become possible by using synthetic data. The dataset with 37,000 synthetic lesions is publicly available for the research community. To evaluate the generalization of our proposed analytical model, we applied the model to QSM images for the same patients. The main contribution is that our proposed analytical model can be applied to another MRI modality without extensive parameter changes when CMB appears with different contracts. CMB classification shows improvements over traditional data augmentation from QSM images. The conditional generative adversarial network is another model that we exploited for the synthetic CMB generation that forms the main contribution in this thesis. Our proposed model is independent of any MRI imaging parameters. It can learn the effect of the CMB on the surrounding tissue. Synthetic Microbleeds could be generated with shape diversity related to any desire locations in the brain, and the model can be applied to a new dataset and improve the CMB classification for an unseen dataset with different MRI parameters, and disease categories with limited ground truth.
View less >
View more >Cerebral Microbleeds (CMB) are small chronic brain haemorrhages, known as paramagnetic blood products, and likely caused by structural abnormalities of the small vessels. The concept of CMB is primarily a radiological construct describing small MRI signal voids. They are often present with cerebrovascular disease, dementia, Alzheimer’s disease, and normal aging people. Substantial progress has been made in recent years, in developing MRI methodologies showing CMB, such as susceptibility-weighted imaging (SWI) which is sensitive to differences in tissue magnetic susceptibility. In SWI, residual paramagnetic blood products appear with a lower intensity and are therefore suitable for detecting CMB, showing as hypointense amorphous small semi-spherical blobs. However, on visual inspection of SWI scans, CMB appearance can easily be confounded with small blood vessels cross-sections. The detection of CMB from MRI is clinically important and has generated increasing interest as an imaging marker of vessel diseases and cognitive dysfunctions. Being subjective, time-consuming, and high missing rates are some critical challenges of human rating. Computer-aided algorithms are being proposed to detect CMB for different MRI modalities with the aim of increasing sensitivity and reducing the number of false positives. However, the lack of ground truth leads to the imbalanced data problem and overfitting. To cope with the lack of dataset, traditional data augmentation methods, undersampling, cost-sensitive matrices are employed to mitigate the imbalanced data during classifier training. There are some problems behind with these applied solutions. Undersampling could omit some informative negative samples, traditional data augmentation methods are highly dependent on the diversity and availability of the ground truth, and the cost-sensitive matrix is not a general solution. Data privacy is a critical issue in medical research and it is essential to highlight that research and public health activities need to be carried out in ways that protect individuals’ privacy. Therefore, the demand to generate synthetic data is increasing. This thesis proposes a synthetic CMB generation model independent of the ground truth availability, and with the capability of being adaptable to new MRI modalities. The second aim is to improve CMB classification by using synthetic CMB in the classifier to achieve high sensitivity with low false positives. To propose our synthetic CMB generation model, we exploited our limited ground truth from SWI MRI images. Some features such as location, intensity, size and shape were extracted to get intuition from real CMB. The Gaussian mixture model was applied to distinguish background and outliers and extract a binary CMB mask to compute volume. To have information about the Microbleeds’ location, we defined our tissue mask including gray matter, white matter, ventricle, and cerebrospinal fluid to compute distance distribution and CMB proportion for each region. For shape, and intensity, the average of all real CMB and minimum intensity values was computed. After analysing the real CMB’s characteristics, we proposed our analytical model to generate synthetic CMB. The model is based on the hypothesis that the rCMB are Gaussian-like structures, elongated in z direction, and spreading all over the brain with low-intensity value. There are some benefits to using an analytical model. The model can cover the diversity of the CMB in location and shape by adding random parameters. In addition, any classifier could be trained only on the synthetic data and tested on the real limited lesions. Extensive validation and testing experiments become possible by using synthetic data. The dataset with 37,000 synthetic lesions is publicly available for the research community. To evaluate the generalization of our proposed analytical model, we applied the model to QSM images for the same patients. The main contribution is that our proposed analytical model can be applied to another MRI modality without extensive parameter changes when CMB appears with different contracts. CMB classification shows improvements over traditional data augmentation from QSM images. The conditional generative adversarial network is another model that we exploited for the synthetic CMB generation that forms the main contribution in this thesis. Our proposed model is independent of any MRI imaging parameters. It can learn the effect of the CMB on the surrounding tissue. Synthetic Microbleeds could be generated with shape diversity related to any desire locations in the brain, and the model can be applied to a new dataset and improve the CMB classification for an unseen dataset with different MRI parameters, and disease categories with limited ground truth.
View less >
Thesis Type
Thesis (PhD Doctorate)
Degree Program
Doctor of Philosophy (PhD)
School
School of Eng & Built Env
Copyright Statement
The author owns the copyright in this thesis, unless stated otherwise.
Subject
Cerebral Microbleeds
Machine learning
MRI