Reverse Backdoor Distillation: Towards Online Backdoor Attack Detection for Deep Neural Network Models

Loading...
Thumbnail Image
File version

Accepted Manuscript (AM)

Author(s)
Yao, Z
Zhang, H
Guo, Y
Tian, X
Peng, W
Zou, Y
Yu Zhang, L
Chen, C
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2024
Size
File type(s)
Location
License
Abstract

The backdoor attack on deep neural network models implants malicious data patterns in a model to induce attacker-desirable behaviors. Existing defense methods fall into the online and offline categories, in which the offline models achieve state-of-the-art detection rates but are restricted by heavy computation overhead. In contrast, their more deployable online counterparts lack the means to detect source-specific backdoors with large sizes. This work proposes a new online backdoor detection method—Reverse Backdoor Distillation (RBD) to handle issues associated with source-specific and source-agnostic backdoor attacks. RBD, designed with the novel perspective of distilling instead of erasing backdoor knowledge, is a complementary backdoor detection methodology that can be used in conjunction with other online backdoor defenses. Considering the fact that trigger data will cause overwhelming neuron activation while clean data will not, RBD distills backdoor attack pattern knowledge from a suspicious model to create a shadow model, which is subsequently deployed online along with the original model in scope to predict a backdoor attack. We extensively evaluate RBD on several datasets (MNIST, GTSRB, CIFAR-10) with diverse model architectures and trigger patterns. RBD outperforms online benchmarks in all experimental settings. Notably, RBD demonstrates superior capability in detecting source-specific attacks, where comparison methods fail, underscoring the effectiveness of our proposed technique. Moreover, RBD achieves a computational savings of at least 97%.

Journal Title

IEEE Transactions on Dependable and Secure Computing

Conference Title
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

This work is covered by copyright. You must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the document is available under a specified licence, refer to the licence for details of permitted re-use. If you believe that this work infringes copyright please make a copyright takedown request using the form at https://www.griffith.edu.au/copyright-matters.

Item Access Status
Note

This publication has been entered in Griffith Research Online as an advance online version.

Access the data
Related item(s)
Subject

Cybersecurity and privacy

Distributed computing and systems software

Persistent link to this record
Citation

Yao, Z; Zhang, H; Guo, Y; Tian, X; Peng, W; Zou, Y; Yu Zhang, L; Chen, C, Reverse Backdoor Distillation: Towards Online Backdoor Attack Detection for Deep Neural Network Models, IEEE Transactions on Dependable and Secure Computing, 2024

Collections