RegMiner: Mining replicable regression dataset from code repositories
File version
Version of Record (VoR)
Author(s)
Lin, Y
Wu, Y
Zhang, Y
Ng, SH
Peng, X
Dong, JS
Mei, H
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
Singapore
Abstract
In this work, we introduce a tool, RegMiner, to automate the process of collecting replicable regression bugs from a set of Git repositories. In the code commit history, RegMiner searches for regressions where a test can pass a regression-fixing commit, fail a regressioninducing commit, and pass a previous working commit again. Technically, RegMiner (1) identifies potential regression-fixing commits from the code evolution history, (2) migrates the test and its code dependencies in the commit over the history, and (3) minimizes the compilation overhead during the regression search. Our experients show that RegMiner can successfully collect 1035 regressions over 147 projects in 8 weeks, creating the largest replicable regression dataset within the shortest period, to the best of our knowledge. In addition, our experiments further show that (1) RegMiner can construct the regression dataset with very high precision and acceptable recall, and (2) the constructed regression dataset is of high authenticity and diversity. The source code of RegMiner is available at https://github.com/SongXueZhi/RegMiner, the mined regression dataset is available at https://regminer.github.io/, and the demonstration video is available at https://youtu.be/yzcM9Y4unok.
Journal Title
Conference Title
ESEC/FSE 2022 - Proceedings of the 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
© 2022 Copyright held by the owner/author(s). This work is licensed under a Creative Commons Attribution 4.0 International License.
Item Access Status
Note
Access the data
Related item(s)
Subject
Software engineering
Data mining and knowledge discovery
Persistent link to this record
Citation
Song, X; Lin, Y; Wu, Y; Zhang, Y; Ng, SH; Peng, X; Dong, JS; Mei, H, RegMiner: Mining replicable regression dataset from code repositories, ESEC/FSE 2022 - Proceedings of the 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 1711-1715