Missing Value Imputation for the Analysis of Incomplete Traffic Accident Data

Thumbnail Image
File version
Accepted Manuscript (AM)
Deb, R
Liew, AWC
Griffith University Author(s)
Primary Supervisor
Other Supervisors
X. Wang, W. Pedrycz, P. Chan, Q. He
File type(s)
Lanzhou, China

Road traffic accidents are a major public health concern, resulting in an estimated 1.3 million deaths and 52 million injuries worldwide each year. All the developed and developing countries suffer from the consequences of increase in both human and vehicle population. Therefore, methods to reduce accident severity are of great interest to traffic agencies and the public at large. To analysis the traffic accident factors effectively we need a complete traffic accident historical database without missing data. Road accident fatality rate depends on many factors and it is a very challenging task to investigate the dependencies between the attributes because of the many environmental and road accident factors. Any missing data in the database could obscure the discovery of important factors and lead to invalid conclusions. In order to make the traffic accident datasets useful for analysis, it should be preprocessed properly. In this paper, we present a novel method based on decision tree and imputed value sampling based on correlation measure for the imputation of missing values to improve the quality of the traffic accident data. We applied our algorithm to the publicly available large traffic accident database of United States (explore.data.gov), which is the largest open federal database in United States. We compare our algorithm with three existing imputation methods using three evaluation criteria, i.e. mean absolute error, coefficient of determination and root mean square error. Our results indicate that the proposed method performs significantly better than the three existing algorithms.

Journal Title
Conference Title
Communications in Computer and Information Science
Book Title
Thesis Type
Degree Program
Publisher link
Patent number
Grant identifier(s)
Rights Statement
Rights Statement
© 2014 Springer Berlin/Heidelberg. This is the author-manuscript version of this paper. Reproduced in accordance with the copyright policy of the publisher. The original publication is available at www.springerlink.com
Item Access Status
Access the data
Related item(s)
Pattern Recognition and Data Mining
Persistent link to this record