Variable selection in Logistic regression model with genetic algorithm

Loading...
Thumbnail Image
File version

Version of Record (VoR)

Author(s)
Zhang, Zhongheng
Trevino, Victor
Hoseini, Sayed Shahabuddin
Belciug, Smaranda
Boopathi, Arumugam Manivanna
Zhang, Ping
Gorunescu, Florin
Subha, Velappan
Dai, Songshi
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2018
Size
File type(s)
Location
License
Abstract

Variable or feature selection is one of the most important steps in model specification. Especially in the case of medical-decision making, the direct use of a medical database, without a previous analysis and preprocessing step, is often counterproductive. In this way, the variable selection represents the method of choosing the most relevant attributes from the database in order to build a robust learning models and, thus, to improve the performance of the models used in the decision process. In biomedical research, the purpose of variable selection is to select clinically important and statistically significant variables, while excluding unrelated or noise variables. A variety of methods exist for variable selection, but none of them is without limitations. For example, the stepwise approach, which is highly used, adds the best variable in each cycle generally producing an acceptable set of variables. Nevertheless, it is limited by the fact that it commonly trapped in local optima. The best subset approach can systematically search the entire covariate pattern space, but the solution pool can be extremely large with tens to hundreds of variables, which is the case in nowadays clinical data. Genetic algorithms (GA) are heuristic optimization approaches and can be used for variable selection in multivariable regression models. This tutorial paper aims to provide a step-by-step approach to the use of GA in variable selection. The R code provided in the text can be extended and adapted to other data analysis needs.

Journal Title

Annals of Translational Medicine

Conference Title
Book Title
Edition
Volume

6

Issue

3

Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

© 2018 AME Publishing Company. The attached file is reproduced here in accordance with the copyright policy of the publisher. Please refer to the journal's website for access to the definitive, published version.

Item Access Status
Note
Access the data
Related item(s)
Subject

Health informatics and information systems

Science & Technology

Life Sciences & Biomedicine

Oncology

Medicine, Research & Experimental

Research & Experimental Medicine

Persistent link to this record
Citation

Zhang, Z; Trevino, V; Hoseini, SS; Belciug, S; Boopathi, AM; Zhang, P; Gorunescu, F; Subha, V; Dai, S, Variable selection in Logistic regression model with genetic algorithm, Annals of Translational Medicine, 2018, 6 (3)

Collections