Repairing Adversarial Texts Through Perturbation
File version
Accepted Manuscript (AM)
Author(s)
Wang, J
Sun, J
Chattopadhyay, S
Wang, X
Dai, T
Shi, J
Dong, JS
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
Cluj-Napoca, Romania
License
Abstract
It is known that neural networks are subject to attacks through adversarial perturbations. Worse yet, such attacks are impossible to eliminate, i.e., the adversarial perturbation is still possible after applying mitigation methods such as adversarial training. Multiple approaches have been developed to detect and reject such adversarial inputs. Rejecting suspicious inputs however may not be always feasible or ideal. First, normal inputs may be rejected due to false alarms generated by the detection algorithm. Second, denial-of-service attacks may be conducted by feeding such systems with adversarial inputs. To address this, in this work, we focus on the text domain and propose an approach to automatically repair adversarial texts at runtime. Given a text which is suspected to be adversarial, we novelly apply multiple adversarial perturbation methods in a positive way to identify a repair, i.e., a slightly mutated but semantically equivalent text that the neural network correctly classifies. Experimental results show that our approach effectively repairs about 80% of adversarial texts. Furthermore, depending on the applied perturbation method, an adversarial text could be repaired about one second on average.
Journal Title
Conference Title
TASE 2022: Theoretical Aspects of Software Engineering
Book Title
Edition
Volume
13299
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
© 2022 Springer Nature Switzerland AG. This is the author-manuscript version of this paper. Reproduced in accordance with the copyright policy of the publisher.The original publication is available at www.springerlink.com
Item Access Status
Note
Access the data
Related item(s)
Subject
Software engineering
Artificial intelligence
Persistent link to this record
Citation
Dong, G; Wang, J; Sun, J; Chattopadhyay, S; Wang, X; Dai, T; Shi, J; Dong, JS, Repairing Adversarial Texts Through Perturbation, TASE 2022: Theoretical Aspects of Software Engineering, 2022, 13299, pp. 29-48