gDefects4DL: A Dataset of General Real-World Deep Learning Program Defects
File version
Author(s)
Lin, Y
Song, X
Sun, J
Feng, Z
Dong, JS
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
Pittsburgh, USA
License
Abstract
The development of deep learning programs, as a new programming paradigm, is observed to suffer from various defects. Emerging research works have been proposed to detect, debug, and repair deep learning bugs, which drive the need to construct the bug benchmarks. In this work, we present gDefects4DL, a dataset for general bugs of deep learning programs. Comparing to existing datasets, gDefects4DL collects bugs where the root causes and fix solutions can be well generalized to other projects. Our general bugs include deep learning program bugs such as (1) violation of deep learning API usage pattern (e.g., the standard to implement cross entropy function y•log(y), y → 0, without NaN error), (2) shape-mismatch of tensor calculation, (3) numeric bugs, (4) type-mismatch (e.g., confusing similar types among numpy, pytorch, and tensorflow), (5) violation of model architecture design convention, and (6) performance bugs.For each bug in gDefects4DL, we describe why it is general and group the bugs with similar root causes and fix solutions for reference. Moreover, gDefects4DL also maintains (1) its buggy/fixed versions and the isolated fix change, (2) an isolated environment to replicate the defect, and (3) the whole code evolution history from the buggy version to the fixed version. We design gDefects4DL with extensible interfaces to evaluate software engineering methodologies and tools. We have integrated tools such as ShapeFlow, DEBAR, and GRIST. gDefects4DL contains 64 bugs falling into 6 categories (i.e., API Misuse, Shape Mismatch, Number Error, Type Mismatch, Violation of Architecture Convention, and Performance Bug). gDefects4DL is available at https://github.com/llmhyy/defects4dl, its online web demonstration is at http://47.93.14.147:9000/bugList, and the demo video is at https://youtu.be/0XtaEt4Fhm4.
Journal Title
Conference Title
2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Software engineering
Persistent link to this record
Citation
Liang, Y; Lin, Y; Song, X; Sun, J; Feng, Z; Dong, JS, gDefects4DL: A Dataset of General Real-World Deep Learning Program Defects, 2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2022, pp. 90-94