“Enabling the Efficient, Dependable Cloud-based Storage of Human Genomes”

From Navigators

Revision as of 14:30, 8 August 2019 by Vielmo (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Vinicius Vielmo Cogo, Alysson Bessani

in 1st Workshop on Distributed and Reliable Storage Systems (DRSS'19), Oct. 2019.

Abstract: Efficiently storing large data sets of human genomes is a long-term ambition from both the research and clinical life sciences communities. For instance, biobanks stock thousands to millions of biological physical samples and have been under pressure to store also their resulting digitized genomes. However, these and other life sciences institutions lack the infrastructure and expertise to efficiently store this data. Cloud computing is a natural economic alternative to private infrastructures, but it is not as good an alternative in terms of security and privacy. In this work, we present an end-to-end composite pipeline intended to enable the efficient, dependable cloud-based storage of human genomes by integrating three mechanisms we have recently proposed. These mechanisms encompass (1) a privacy-sensitivity detector for human genomes, (2) a similarity-based deduplication and delta-encoding algorithm for sequencing data, and (3) an auditability scheme to verify who has effectively read data in storage systems that use secure information dispersal. By integrating them with appropriate storage configurations, one can obtain reasonable privacy protection, security, and dependability guarantees at modest costs (e.g., less than $1/Genome/Year). Our preliminary analysis indicates that this pipeline costs only 3% more than non-replicated systems, 48% less than fully-replicating all data, and 31% less than secure information dispersal schemes.

Download paper

Download Enabling the Efficient, Dependable Cloud-based Storage of Human Genomes

Export citation


Project(s): Project:BioBankCloud, Project:SUPERCLOUD, Project:DiSIEM, Project:IRCoC

Research line(s): Fault and Intrusion Tolerance in Open Distributed Systems (FIT)

Personal tools
Navigators toolbox