PostMatch: A Framework for Efficient Address Matching
Author(s)
Yates, D
Islam, MZ
Zhao, Y
Nayak, R
Estivill-Castro, V
Kanhere, S
Griffith University Author(s)
Year published
2021
Metadata
Show full item recordAbstract
Matching lists of addresses is an increasingly common task executed by business and governments alike. However, due to security issues, this task cannot always be performed using cloud computing. Moreover, addresses can arrive with spelling errors that can cause non-matches or ‘false negatives’ to occur. Our proposed framework, PostMatch, provides a locally-executed method for address-matching that combines the open-source ‘Libpostal’ address-parsing library with our ‘postparse’ post-processor code and machine-learning. PostMatch provides improved parsing accuracy compared with Libpostal alone, approaching 96.9%. The matching ...
View more >Matching lists of addresses is an increasingly common task executed by business and governments alike. However, due to security issues, this task cannot always be performed using cloud computing. Moreover, addresses can arrive with spelling errors that can cause non-matches or ‘false negatives’ to occur. Our proposed framework, PostMatch, provides a locally-executed method for address-matching that combines the open-source ‘Libpostal’ address-parsing library with our ‘postparse’ post-processor code and machine-learning. PostMatch provides improved parsing accuracy compared with Libpostal alone, approaching 96.9%. The matching process features the Jaro-Winkler edit distance algorithm together with XGBoost machine-learning to achieve very high accuracy on public data. PostMatch is open-source (GPL3 licensed) and available as R script code on Github.
View less >
View more >Matching lists of addresses is an increasingly common task executed by business and governments alike. However, due to security issues, this task cannot always be performed using cloud computing. Moreover, addresses can arrive with spelling errors that can cause non-matches or ‘false negatives’ to occur. Our proposed framework, PostMatch, provides a locally-executed method for address-matching that combines the open-source ‘Libpostal’ address-parsing library with our ‘postparse’ post-processor code and machine-learning. PostMatch provides improved parsing accuracy compared with Libpostal alone, approaching 96.9%. The matching process features the Jaro-Winkler edit distance algorithm together with XGBoost machine-learning to achieve very high accuracy on public data. PostMatch is open-source (GPL3 licensed) and available as R script code on Github.
View less >
Conference Title
Communications in Computer and Information Science
Volume
1504
Subject
Software engineering