Reverse Backdoor Distillation: Towards Online Backdoor Attack Detection for Deep Neural Network Models
File version
Accepted Manuscript (AM)
Author(s)
Zhang, H
Guo, Y
Tian, X
Peng, W
Zou, Y
Yu Zhang, L
Chen, C
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
License
Abstract
The backdoor attack on deep neural network models implants malicious data patterns in a model to induce attacker-desirable behaviors. Existing defense methods fall into the online and offline categories, in which the offline models achieve state-of-the-art detection rates but are restricted by heavy computation overhead. In contrast, their more deployable online counterparts lack the means to detect source-specific backdoors with large sizes. This work proposes a new online backdoor detection method—Reverse Backdoor Distillation (RBD) to handle issues associated with source-specific and source-agnostic backdoor attacks. RBD, designed with the novel perspective of distilling instead of erasing backdoor knowledge, is a complementary backdoor detection methodology that can be used in conjunction with other online backdoor defenses. Considering the fact that trigger data will cause overwhelming neuron activation while clean data will not, RBD distills backdoor attack pattern knowledge from a suspicious model to create a shadow model, which is subsequently deployed online along with the original model in scope to predict a backdoor attack. We extensively evaluate RBD on several datasets (MNIST, GTSRB, CIFAR-10) with diverse model architectures and trigger patterns. RBD outperforms online benchmarks in all experimental settings. Notably, RBD demonstrates superior capability in detecting source-specific attacks, where comparison methods fail, underscoring the effectiveness of our proposed technique. Moreover, RBD achieves a computational savings of at least 97%.
Journal Title
IEEE Transactions on Dependable and Secure Computing
Conference Title
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
This work is covered by copyright. You must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the document is available under a specified licence, refer to the licence for details of permitted re-use. If you believe that this work infringes copyright please make a copyright takedown request using the form at https://www.griffith.edu.au/copyright-matters.
Item Access Status
Note
This publication has been entered in Griffith Research Online as an advance online version.
Access the data
Related item(s)
Subject
Cybersecurity and privacy
Distributed computing and systems software
Persistent link to this record
Citation
Yao, Z; Zhang, H; Guo, Y; Tian, X; Peng, W; Zou, Y; Yu Zhang, L; Chen, C, Reverse Backdoor Distillation: Towards Online Backdoor Attack Detection for Deep Neural Network Models, IEEE Transactions on Dependable and Secure Computing, 2024