Browse wiki

From Navigators

Jump to: navigation, search
Abstract Biobanks store and catalog human biologica Biobanks store and catalog human biological material that is increasingly being digitized using next-generation sequencing (NGS). There is, however, a computational bottleneck, as existing software systems are not scalable and secure enough to store and process the incoming wave of genomic data from NGS machines. In the BiobankCloud project, we are building a Hadoop-based platform for the secure storage, sharing, and parallel processing of genomic data. We extended Hadoop to include support for multi-tenant studies, reduced storage requirements with erasure coding, and added support for extensible and consistent metadata. On top of Hadoop, we built a scalable scientific workflow engine featuring a proper workflow definition language focusing on simple integration and chaining of existing tools, adaptive scheduling on Apache Yarn, and support for iterative dataflows. Our platform also supports the secure sharing of data across different, distributed Hadoop clusters. The software is easily installed and comes with a user-friendly web interface for running, managing, and accessing data sets behind a secure 2-factor authentication. Initial tests have shown that the engine scales well to dozens of nodes. The entire system is open-source and includes pre-defined workflows for popular tasks in biomedical data analysis, such as variant identification, differential transcriptome analysis using RNA-Seq, and analysis of miRNA-Seq and ChIP-Seq data. d analysis of miRNA-Seq and ChIP-Seq data.
Address Hawaii, US  +
Author Alysson Bessani + , Jörgen Brandt + , Marc Bux + , Vinicius Vielmo Cogo + , Lora Dimitrova + , Jim Dowling + , Ali Gholami + , Kamal Hakimzadeh + , Michael Hummel + , Mahmoud Ismail + , Erwin Laure + , Ulf Leser + , Jan-Eric Litton + , Roxanna Martinez + , Jane Reichel + , Salman Niazi + , Karin Zimmermann +
Booktitle Proceedings of the 1st Int. Workshop on Data Management and Analytics for Medicine and Healthcare (DMAH 2015)  +
Document Document for Publication-Bessani2015biobankcloud-platform.pdf +
Key Bessani2015biobankcloud-platform  +
Month sep  +
NumPubDate 2,015.09  +
Project Project:BioBankCloud +
ResearchLine Fault and Intrusion Tolerance in Open Distributed Systems (FIT) +
Title BiobankCloud: a Platform for the Secure Storage, Sharing, and Processing of Large Biomedical Data Sets  +
Type inproceedings  +
Year 2015  +
Has improper value forThis property is a special property in this wiki. Url  +
Categories Publication  +
Modification dateThis property is a special property in this wiki. 30 July 2015 14:27:27  +
hide properties that link here 
  No properties link to this page.


Enter the name of the page to start browsing from.
Personal tools
Navigators toolbox