MarkErase: Defeating Entangled Watermarks in Model Extraction Attacks
File version
Author(s)
Liu, X
Zhang, Y
Yuan, H
Zhu, T
Zhang, LY
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Wu, Xintao
Spiliopoulou, Myra
Wang, Can
Kumar, Vipin
Cao, Longbing
Wu, Yanqiu
Yao, Yu
Wu, Zhangkai
Date
Size
File type(s)
Location
Sydney, Australia
License
Abstract
Model Extraction (ME) attacks have threatened the intellectual property of machine learning models, in which adversaries extract a target model using carefully crafted samples. Model watermarking is proposed to protect model ownership, which embeds specific information into models. State-of-the-art methods usually entangle watermark samples with main-task samples, aiming to provide robust watermark verification under ME attack. However, in this paper, we defeat the entangled watermarks and demonstrate their vulnerability to detection and removal attacks, using only a small set of clean samples. Further, we propose a novel framework, named MarkErase, to perform ME attack against entangled watermark. MarkErase is based on two key observations. First, we identify the unique classification tendency of watermarked models, enabling early detection of the watermark during an attack. Second, based on the observation that models with entangled watermarks tend to misclassify perturbed inputs as the target class, we propose a selective distillation method that effectively removes the watermark while maintaining the main-task accuracy. Comprehensive experiments show that MarkErase achieves a watermark-task accuracy close to 0, with minimal loss to the main-task performance. Our code is publicly available https://github.com/MarkErase/MarkErase.git.
Journal Title
Conference Title
Advances in Knowledge Discovery and Data Mining: 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2025, Sydney, NSW, Australia, June 10–13, 2025, Proceedings, Part IV
Book Title
Edition
Volume
15873
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Persistent link to this record
Citation
Fei, T; Liu, X; Zhang, Y; Yuan, H; Zhu, T; Zhang, LY, MarkErase: Defeating Entangled Watermarks in Model Extraction Attacks, Advances in Knowledge Discovery and Data Mining: 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2025, Sydney, NSW, Australia, June 10–13, 2025, Proceedings, Part IV, 2025, 5873, pp. 78-90