Preserving Privacy for Distributed Genome-Wide Analysis Against Identity Tracing Attacks
File version
Accepted Manuscript (AM)
Author(s)
Bai, G
Li, X
Nepal, S
Grobler, M
Chen, C
Ko, RKL
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
License
Abstract
Genome-wide analysis has demonstrated both health and social benefits. However, large scale sharing of such data may reveal sensitive information about individuals. One of the emerging challenges is identity tracing attack that exploits correlations among genomic data to reveal the identity of DNA samples. In this paper, we first demonstrate that the adversary can narrow down the sample’s identity by detecting his/her genetic relatives and quantify such privacy threat by employing a Shannon entropy-based measurement. For example, we exemplify that when the dataset size reaches 30% of the population, for any target from that population, the uncertainty of the target’s identity is reduced to merely 2.3 bits of entropy (i.e., the identity is pinned down within 5 people). Direct application of existing approaches such as differential privacy (DP), secure multiparty computation (MPC) and homomorphic encryption (HE) may not be applicable to this challenge in genome-wide analysis because of the compromise on utility (i.e., accuracy or efficiency). Towards addressing this challenge, this paper proposes a framework named υFrag to facilitate privacy-preserving data sharing and computation in genome-wide analysis. υFrag mitigates privacy risks by using a vertical fragmentation to disrupt the genetic architecture on which the adversary relies for identity tracing without sacrificing the capability of genome-wide analysis. We theoretically prove that it preserves the correctness of the primitive functionalities and algorithms ranging from basic summary statistics to advanced neural networks. Our experiments demonstrate that υFrag outperforms secure multiparty computation (MPC) and homomorphic encryption (HE) protocols, with a speedup of more than 221x for training neural networks, and also traditional non-private algorithms and a state-of-the-art noise-based differential privacy (DP) solution in most settings.
Journal Title
IEEE Transactions on Dependable and Secure Computing
Conference Title
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Item Access Status
Note
This publication has been entered in Griffith Research Online as an advanced online version.
Access the data
Related item(s)
Subject
Health informatics and information systems
Information security management
Genomics
Health policy
Cybersecurity and privacy
Distributed computing and systems software
Persistent link to this record
Citation
Zhang, Y; Bai, G; Li, X; Nepal, S; Grobler, M; Chen, C; Ko, RKL, Preserving Privacy for Distributed Genome-Wide Analysis Against Identity Tracing Attacks, IEEE Transactions on Dependable and Secure Computing, 2022, pp. 1-17