Privacy-Preserving Gradient Descent for Distributed Genome-Wide Analysis

View/ Open
File version
Accepted Manuscript (AM)
Author(s)
Zhang, Y
Bai, G
Li, X
Curtis, C
Chen, C
Ko, RKL
Year published
2021
Metadata
Show full item recordAbstract
Genome-wide analysis, which provides perceptive insights into complex diseases, plays an important role in biomedical data analytics. It usually involves large-scale human genomic data, and thus may disclose sensitive information about individuals. While existing studies have been conducted against data exfiltration by external malicious actors, this work focuses on the emerging identity tracing attack that occurs when a dishonest insider attempts to re-identify obtained DNA samples. We propose a framework named υFRAG to facilitate privacy-preserving data sharing and computation in genome-wide analysis. υFRAG mitigates privacy ...
View more >Genome-wide analysis, which provides perceptive insights into complex diseases, plays an important role in biomedical data analytics. It usually involves large-scale human genomic data, and thus may disclose sensitive information about individuals. While existing studies have been conducted against data exfiltration by external malicious actors, this work focuses on the emerging identity tracing attack that occurs when a dishonest insider attempts to re-identify obtained DNA samples. We propose a framework named υFRAG to facilitate privacy-preserving data sharing and computation in genome-wide analysis. υFRAG mitigates privacy risks by using vertical fragmentations to disrupt the genetic architecture on which the adversary relies for re-identification. The fragmentation significantly reduces the overall amount of information the adversary can obtain. Notably, it introduces no sacrifice to the capability of genome-wide analysis—we prove that it preserves the correctness of gradient descent, the most popular optimization approach for training machine learning models. We also explore the efficiency performance of υFRAG through experiments on a large-scale, real-world dataset. Our experiments demonstrate that υFRAG outperforms not only secure multiparty computation (MPC) and homomorphic encryption (HE) protocols with a speedup of more than 221x for training neural networks, but also noise-based differential privacy (DP) solutions and traditional non-private algorithms in most settings.
View less >
View more >Genome-wide analysis, which provides perceptive insights into complex diseases, plays an important role in biomedical data analytics. It usually involves large-scale human genomic data, and thus may disclose sensitive information about individuals. While existing studies have been conducted against data exfiltration by external malicious actors, this work focuses on the emerging identity tracing attack that occurs when a dishonest insider attempts to re-identify obtained DNA samples. We propose a framework named υFRAG to facilitate privacy-preserving data sharing and computation in genome-wide analysis. υFRAG mitigates privacy risks by using vertical fragmentations to disrupt the genetic architecture on which the adversary relies for re-identification. The fragmentation significantly reduces the overall amount of information the adversary can obtain. Notably, it introduces no sacrifice to the capability of genome-wide analysis—we prove that it preserves the correctness of gradient descent, the most popular optimization approach for training machine learning models. We also explore the efficiency performance of υFRAG through experiments on a large-scale, real-world dataset. Our experiments demonstrate that υFRAG outperforms not only secure multiparty computation (MPC) and homomorphic encryption (HE) protocols with a speedup of more than 221x for training neural networks, but also noise-based differential privacy (DP) solutions and traditional non-private algorithms in most settings.
View less >
Conference Title
Lecture Notes in Computer Science
Volume
12973
Copyright Statement
© Springer Nature Switzerland AG 2021. This is the author-manuscript version of this paper. Reproduced in accordance with the copyright policy of the publisher.The original publication is available at www.springerlink.com
Subject
Genomics
Clinical sciences