A correlation based imputation method for incomplete traffic accident data

View/ Open
File version
Accepted Manuscript (AM)
Author(s)
Deb, Rupam
Liew, Alan Wee-Chung
Oh, Erwin
Year published
2014
Metadata
Show full item recordAbstract
Death, injury and disability from road traffic crashes continue to be a major global public health problem. Recent data suggest that the number of fatalities from traffic crashes is in excess of 1.25 million people each year with non-fatal injuries affecting a further 20-50 million people. It is predicted that by 2030 road traffic accidents will have progressed to be the 5th leading cause of death and that the number of people who will die annually from traffic accidents will have doubled from current levels. Therefore, methods to reduce accident severity are of great interest to traffic agencies and the public at large. ...
View more >Death, injury and disability from road traffic crashes continue to be a major global public health problem. Recent data suggest that the number of fatalities from traffic crashes is in excess of 1.25 million people each year with non-fatal injuries affecting a further 20-50 million people. It is predicted that by 2030 road traffic accidents will have progressed to be the 5th leading cause of death and that the number of people who will die annually from traffic accidents will have doubled from current levels. Therefore, methods to reduce accident severity are of great interest to traffic agencies and the public at large. Road accident fatality rate depends on many factors and it is a very challenging task to investigate the dependencies between the attributes because of the many environmental and road accident factors. Any missing data in the database could obscure the discovery of important factors and lead to invalid conclusions. In order to make the traffic accident datasets useful for analysis, it should be preprocessed properly. In this paper, we present a novel method based on sampling of distributions obtained from correlation measures for the imputation of missing values to improve the quality of the traffic accident data. We evaluated our algorithm using two publicly available traffic accident databases of United States (explore.data.gov, data.opencolorado.org). Our results indicate that the proposed method performs significantly better than the three existing algorithms.
View less >
View more >Death, injury and disability from road traffic crashes continue to be a major global public health problem. Recent data suggest that the number of fatalities from traffic crashes is in excess of 1.25 million people each year with non-fatal injuries affecting a further 20-50 million people. It is predicted that by 2030 road traffic accidents will have progressed to be the 5th leading cause of death and that the number of people who will die annually from traffic accidents will have doubled from current levels. Therefore, methods to reduce accident severity are of great interest to traffic agencies and the public at large. Road accident fatality rate depends on many factors and it is a very challenging task to investigate the dependencies between the attributes because of the many environmental and road accident factors. Any missing data in the database could obscure the discovery of important factors and lead to invalid conclusions. In order to make the traffic accident datasets useful for analysis, it should be preprocessed properly. In this paper, we present a novel method based on sampling of distributions obtained from correlation measures for the imputation of missing values to improve the quality of the traffic accident data. We evaluated our algorithm using two publicly available traffic accident databases of United States (explore.data.gov, data.opencolorado.org). Our results indicate that the proposed method performs significantly better than the three existing algorithms.
View less >
Conference Title
PRICAI 2014: TRENDS IN ARTIFICIAL INTELLIGENCE
Volume
8862
Publisher URI
Copyright Statement
© 2014 Springer Berlin/Heidelberg. This is the author-manuscript version of this paper. Reproduced in accordance with the copyright policy of the publisher.The original publication is available at www.springerlink.com
Subject
Pattern Recognition and Data Mining