Show simple item record

dc.contributor.advisorCripps, Allan W
dc.contributor.authorChen, Pin-Yen
dc.date.accessioned2021-01-20T05:36:33Z
dc.date.available2021-01-20T05:36:33Z
dc.date.issued2021-01-11
dc.identifier.doi10.25904/1912/4059
dc.identifier.urihttp://hdl.handle.net/10072/401351
dc.description.abstractMetabolic syndrome (MetS) is a condition that is linked to the increased risk of developing chronic diseases, including type 2 diabetes mellitus (T2DM) and cardiovascular disease (CVD). The association between MetS and chronic disease development lies in the cardiometabolic risk factors that comprise MetS: abdominal obesity, hypertension, hyperglycaemia and dyslipidaemia [1]. The development of MetS is also associated with the dysregulation of many different body systems, such as the immune system [2] and gut microbiome [3]. Due to its multifactorial nature, research in MetS requires the simultaneous analysis of multiple biomarkers across different body systems. As most research thus far have utilised univariate analysis, no biomarker profile has been identified to characterise individuals more at risk of MetS and related diseases. The current study has therefore implemented the use of correlationbased network analysis (CNA) and multiple classification models to identify the biomarkers that collectively link to increased MetS development. Four variable groups comprising of multiple different measurements were obtained from 117 healthy weight controls and 35 obese with MetS individuals. The four variable groups consisted of: anthropometric measures, haematological measures, gene expression levels and gut microbial counts. The use of CNA allowed a better understanding of the relationships between biomarkers affected by MetS. As expected, the obese with MetS network was denser than the healthy weight control network, demonstrating the complex nature of MetS. The results found molecular interactions supporting the findings of previous literature, particularly correlations that demonstrated the development of anaemia of inflammation in the obese with MetS network. There were also three key hubs identified using gene expression levels, involving transcription factor EB (TFEB), lipocalin 2 (LCN2), and cluster of differentiation- (CD-) 68. The three genes are associated with regulatory T cells and neutrophils, two prominent cells in regulating the inflammatory state. As obesity and MetS are often described as a state of chronic low-grade inflammation, the findings of CNA correspond with that of previous studies. Classification models are another type of analytical tool that have demonstrated high predictive ability in many diseases, including T2DM and CVD. The use of classification models for the prediction of diseases allows the risk of disease development to be evaluated. The current study applied classification models for the prediction of MetS using three of the four variable groups measured: haematological measures, gene expression levels and gut microbial counts. Classification models are not only able to assess the relevance of these variable groups to MetS but also identify the specific variables that contributed the most to MetS development. There are a range of classification models that can be used and due to MetS being a relatively new area of research, the most appropriate model for MetS prediction has yet to be determined. As such, the current study predicted MetS using four different types of classification models and compared the predictive abilities of each model. The four models that were used in the current study were: logistic regression (LR), decision tree (DT), support vector machine (SVM) and artificial neural network (ANN). The performance of each classification model was evaluated using 10-fold cross-validation, which splits the dataset into 10 training and testing sets. Each model is then built using the training sets and evaluated using the testing sets to ensure that the model was not fit too closely to the training data. The model with the highest performance when predicting MetS using haematological measures and gut microbial counts was ANN, while SVM had the highest performance when using gene expression levels. However, ANN was also able to attain a high area under the curve (AUC) value of 0.804 when predicting MetS using gene expression levels. As such, the prediction model that had the highest performance overall was ANN. Each model has their own strengths and limitations dealing with specific types of data and the most appropriate model depends on the research question being asked. Although SVM and ANN are both very powerful algorithms, capable of handling high-dimensional data, both models have difficulty producing clinically significant results. On the other hand, LR and DT models are both able to identify specific biomarkers that should be further investigated for links to diseases development, deeming them more suitable for clinical applications. For each of the 10 LR and DT models, constructed using the 10 training sets, the haematological measurement that was found to be most important was triglycerides (TG). Additionally, the best performing LR model, out of the 10 constructed models, found measurements of TG, platelets (PLT), erythrocyte sedimentation rate (ESR), fasting plasma glucose (FPG), haemoglobin (HG) and glycated haemoglobin A1c (HbA1c) to be associated with MetS development. At the same time, high-density lipoprotein-cholesterol (HDL-C) was linked to a reduced risk of MetS development. Using DT, the important measurements in MetS development were TG, PLT, HDL-C, age, HG, C-reactive protein (CRP) and white cell counts. Each variable identified has been found to be linked to either a cardiometabolic risk factor or inflammation and thus the results of the current study are supportive of previous literature on obesity and MetS. Logistic regression also found the expression of AKT serine/threonine kinase 3 (AKT3), Fc fragment of IgE receptor II (FCER2), cathelicidin antimicrobial peptide (CAMP), interleukin- 11 receptor subunit alpha (IL11RA) and granzyme H (GZMH) to increase the odds of developing MetS while C-X-C motif chemokine receptor 6 (CXCR6), C-C motif chemokine ligand- (CCL-)3, suppressor of cytokine signalling 1 (SOCS1) and killer cell lectin like receptor C2 (KLRC2) expression reduces these odds. Consistent with these findings, DTs also predicted individuals with high AKT3, FCER2 and CAMP expression to be obese with MetS while healthy weight controls had higher CXCR6, CCL3 and KLRC2 expression. The findings of the current study were partially supportive of previous literature, with FCER2 and CAMP expression being associated with obesity and inflammation [4, 5] and KLRC2 expression being inversely associated with obesity and inflammation [6, 7]. On the other hand, AKT3 is associated with glucose and lipid metabolism [8] with evidence of its expression leading to the protection against insulin resistance. As such, the high AKT3 expression found in the obese with MetS cohort was not consistent with current literature. Similarly, the association between the expression of CXCR6 and CCL3 with a healthy weight control classification could not be explained as both genes are typically linked to inflammation [9, 10]. Finally, LR and DT found microbial species belonging to the Firmicutes and Bacteroidetes phyla to both be associated with the increased and reduced risk of developing obesity with MetS. Obesity with MetS is largely characterised by a high Firmicutes-to-Bacteroidetes ratio, particularly when compared to healthy weight controls [11]. While this pattern was not clearly evident in the current study, the cause of discrepancy with previous literature may be due to gut microbial studies in obesity and MetS not being typically reported at the species level. While LR and DT are both able to identify the variables that are likely to contribute to MetS development in a clinical setting, the performances of either model were not able to compete with that of ANN or SVM. At the same time, despite having the highest performance overall, ANN is unable to produce easily interpretable results with clinical significance. As such, its high predictive ability is not enough to convince researchers to choose ANN for clinical use. To overcome this issue, many researchers combine ANN with a feature selection technique, such as genetic algorithm (GA). Feature selection techniques are able to identify the best combination of biomarkers for the prediction of diseases. In the current study, the variables that were recognised to be significant by the hybrid model supported the findings of LR and DT. The haematological biomarkers that were consistently recognised as important by all three prediction models were measures of TG and HG. Additionally, CCL3 and CXCR6 expression, as well as three gut microbial species belonging to the Firmicutes and Bacteroidetes phyla, were also found to be important for MetS development. Other than the identification of important variables, the hybrid model was also able to improve the performance of ANN when predicting MetS using gene expression levels and gut microbial counts. Consequently, the current study concluded that the hybrid GA with ANN model was considered to be the most appropriate for MetS prediction. Another analytical method that was used by the current study was weighted majority voting, which combines the final predicted outcomes of the other classification models to determine whether the performance could be further improved. The weighted majority voting method was able to achieve the highest AUC value for the prediction of MetS using gut microbial counts as well as the second highest AUC when using haematological measures and gene expression levels. However, the dependency of the weighted majority voting method on the performance of individual classification models used was demonstrated in the study. The low sensitivity values attained by DT in the testing set of all three variable groups is likely what prevented the weighted majority voting method from outperforming all the other classification models in the prediction of MetS. In spite of the limitation caused by DT, however, the method was still able to achieve a high performance. As such, the combination of the results from different classification models into a weighted majority voting method to increase the overall predictive ability was found to be a viable choice. The classification model that was found to be most suitable for the prediction of MetS was the hybrid GA with ANN model. Not only was the model able to achieve high predictive ability due to the ANN portion of the model, it was also able to reveal the optimal combination of variables that contributed the most to an accurate MetS prediction. The variables that were identified were also supportive of the findings of both LR and DT. The measurements used by the current study (haematological measures, gene expression levels and gut microbial counts) were all found to be suitable for the prediction of MetS. Future studies may consider the use of other biomarkers, including measurements from adipose tissue, for the prediction of MetS using the hybrid GA with ANN model.
dc.languageEnglish
dc.language.isoen
dc.publisherGriffith University
dc.publisher.placeBrisbane
dc.subject.keywordsMetabolic syndrome
dc.titleIdentification of biomarkers for obesity with metabolic syndrome using machine learning models
dc.typeGriffith thesis
gro.facultyGriffith Health
gro.rights.copyrightThe author owns the copyright in this thesis, unless stated otherwise.
gro.hasfulltextFull Text
dc.contributor.otheradvisorWest, Nicholas P
dc.contributor.otheradvisorZhang, Ping
gro.identifier.gurtID000000024868
gro.thesis.degreelevelThesis (PhD Doctorate)
gro.thesis.degreeprogramDoctor of Philosophy (PhD)
gro.departmentSchool of Medical Science
gro.griffith.authorChen, Pin-Yen (Fiona)


Files in this item

This item appears in the following Collection(s)

Show simple item record