Abstract
|
Finding the balance between privacy protec … Finding the balance between privacy protection and data sharing is one of the main challenges in managing human genomic data nowadays. Novel privacy-enhancing technologies are required to address the known disclosure threats to personal sensitive genomic data without precluding data sharing. In this paper, we propose a method that systematically detects privacy-sensitive DNA segments coming directly from an input stream, using as reference a knowledge database of known privacy-sensitive nucleic and amino acid sequences. We show that adding our detection method to standard security techniques provides a robust, efficient privacy-preserving solution that neutralizes threats related to recently published attacks on genome privacy based on short tandem repeats, disease-related genes, and genomic variations. Current global knowledge on human genomes demonstrates the feasibility of our approach to obtain a comprehensive database immediately, which can also evolve automatically to address future attacks as new privacy-sensitive sequences are identified. Additionally, we validate that the detection method can be fitted inline with the NGS---Next Generation Sequencing---production cycle by using Bloom filters and scaling out to faster sequencing machines. scaling out to faster sequencing machines.
|