MarkErase: Defeating Entangled Watermarks in Model Extraction Attacks

No Thumbnail Available
File version
Author(s)
Fei, T
Liu, X
Zhang, Y
Yuan, H
Zhu, T
Zhang, LY
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)

Wu, Xintao

Spiliopoulou, Myra

Wang, Can

Kumar, Vipin

Cao, Longbing

Wu, Yanqiu

Yao, Yu

Wu, Zhangkai

Date
2025
Size
File type(s)
Location

Sydney, Australia

License
Abstract

Model Extraction (ME) attacks have threatened the intellectual property of machine learning models, in which adversaries extract a target model using carefully crafted samples. Model watermarking is proposed to protect model ownership, which embeds specific information into models. State-of-the-art methods usually entangle watermark samples with main-task samples, aiming to provide robust watermark verification under ME attack. However, in this paper, we defeat the entangled watermarks and demonstrate their vulnerability to detection and removal attacks, using only a small set of clean samples. Further, we propose a novel framework, named MarkErase, to perform ME attack against entangled watermark. MarkErase is based on two key observations. First, we identify the unique classification tendency of watermarked models, enabling early detection of the watermark during an attack. Second, based on the observation that models with entangled watermarks tend to misclassify perturbed inputs as the target class, we propose a selective distillation method that effectively removes the watermark while maintaining the main-task accuracy. Comprehensive experiments show that MarkErase achieves a watermark-task accuracy close to 0, with minimal loss to the main-task performance. Our code is publicly available https://github.com/MarkErase/MarkErase.git.

Journal Title
Conference Title

Advances in Knowledge Discovery and Data Mining: 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2025, Sydney, NSW, Australia, June 10–13, 2025, Proceedings, Part IV

Book Title
Edition
Volume

15873

Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Persistent link to this record
Citation

Fei, T; Liu, X; Zhang, Y; Yuan, H; Zhu, T; Zhang, LY, MarkErase: Defeating Entangled Watermarks in Model Extraction Attacks, Advances in Knowledge Discovery and Data Mining: 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2025, Sydney, NSW, Australia, June 10–13, 2025, Proceedings, Part IV, 2025, 5873, pp. 78-90