ReD: Resilient Database Clusters
- Research Line(s): Fault and Intrusion Tolerance in Open Distributed Systems (FIT)
- Sponsor: FCT
- Project Number: PTDC/EIA-EIA/109044/2008
- Total award amount: 123.6K Euros
- Coordinator: CCTC/U.Minho
- Partners: CCTC/U.Minho, FCUL
- Team at FCUL: 5 researchers, including Alysson Bessani, Nuno Ferreira Neves, Paulo Verissimo
Relational database management systems (RDBMS) have long been the trusty workhorse of the information technology (IT) industry. In fact, by holding all shared mutable state and being responsible for durability, the RDBMS is the key component in system scale-out and availability, making database server clusters a perennial hot topic of research in industry and academia.
The current state of the art to address these challenges is still, after a long standing debate, split between shared-storage and shared-nothing clustering architectures. On one hand, a shared-storage cluster allows maximum resource efficiency: One uses as many nodes as required for processing the workload and to ensure the desired availability, while the storage is configured solely according to the desired storage bandwidth and disk resilience. Unfortunately, a shared-storage approach based on distributed shared memory and distributed locking raises a number of problems, which make such solutions costly to develop and deploy. Namely, server software needs to be heavily refactored to deal with distributed locking, buffer invalidation, and recovery from partial cluster failure. Anecdotal evidence for these is that none of the mainstream open source database servers provide this option. Most commercial database servers also lack a shared-storage configuration. Also, true write sharing is a potential source of corruption upon software or hardware faults. It is also an additional vulnerability to malicious intrusions.
On the other hand, there have been a number of proposals for shared-nothing database server clusters based on consistent replication. All these share the same basic approach: Updates are ordered and propagated before replying back to the client, thus ensuring that no conflicts arise after the transaction commits. The resulting performance and scalability are very good, especially, with currently common mostly read-only workloads. The logical independence of database replicas also increases resilience to data corruption, whether malicious or not. Moreover they are inexpensive and widely available as an add-on to all major DBMS, as no changes to the server software are required. Unfortunately, in a shared-nothing cluster a separate physical copy of data is required for each node. Therefore, even if a only few copies are required for dependability, a large cluster with hundreds of nodes must be configured also with sufficient storage capacity for hundreds of copies of data. In large scale systems, this imposes a hardware and operational cost that offsets their initial advantage.
The goal of project ReD is to achieve a generic, robust, and inexpensive shared-storage cluster from an off-the-shelf RDBMS. In detail, the project will deliver the following concrete results:
A general architecture and specification of the proposed approach. An exploration of the performance, scalability, and dependabilityaspects of the approach, highlighting the most interesting tradeoffs. Adetailed experimental evaluation, using the prototype and industry standardtransaction processing benchmarks.
Challenge and Approach
It might look simple at first sight to extend the shared-nothing protocol to cope with shared storage: If all replicas perform exactly the same write operations, database state would be identical and thus could be shared. Unfortunately, internal non-determinism means that different physical images are produced regardless of logical consistency, leading to corruption. Moreover, such simple approach would not preserve the logical independence of replicas and rule out tolerating Byzantine faults.
The ReD approach is to combine the replication protocol with a specialized copy-on-write volume management system, that holds transient logically independent partial copies, thus masking internal server non-determinism and isolating multiple logical replicas for resilience.
- Alysson Bessani, João Sousa, Eduardo Alchieri, “... And StateMachine Replication for All with BFT-SMaRt”, Apr. 2012.
- Alysson Bessani, “(BFT) State Machine Replication: The Hype, The Virtue... and even some Practice”, Apr. 2012.
- Alysson Bessani, Miguel Correia, Bruno Quaresma, Fernando André, Paulo Sousa, “DepSky: Dependable and Secure Storage in a Cloud-of-Clouds”, in Proceedings of the 6th ACM SIGOPS/EuroSys European Systems Conference - EuroSys'11. Salzburg, Austria. April 2011., Apr. 2011.
- Alysson Bessani, Miguel Correia, Paulo Sousa, “Active Quorum Systems”, in Proceedings of the 6th Workshop on Hot Topics in System Dependability - HotDep'10 (together with USENIX OSDI'10). Vancouver, Canada. October 2010., Oct. 2010.
- Giuliana Santos Veronese, Miguel Correia, Alysson Bessani, Lau Cheuk Lung, “EBAWA: Efficient Byzantine Agreement for Wide-Area Networks”, in Proceedings of the 12th IEEE International High Assurance Systems Engineering Symposium - HASE'10. San Jose, CA, USA. November 2010., Oct. 2010.
- Bruno Quaresma, Alysson Bessani, Paulo Sousa, “Melhorando a Fiabilidade e Segurança do Armazenamento em Clouds”, in Actas do INForum - Simpósio de Informática 2010, Braga, Portugal, Sep. 2010., Sept. 2010.
- Alysson Bessani, “Active Quorum Systems: Specification and Correctness Proof”, Missing institution, Tech. Rep., Jul. 2010. DI-FCUL-TR 2010-02
BibTeXNavigators - ReD project
|Current projects:||VEDLIoT, SATO, ADMORPH, DiSIEM, SEAL, AQUAMON, UPVN, REDBOOK, IRCoC, SEL, Xivt, Abyss|
|Past projects:||TCLOUDS, MASSIF, MAFTIA, RESIST NoE, KARYON, HIDENETS, CORTEX, CRUTIAL, TRONE, SITAN, ReD, DIVERSE, CloudFIT, READAPT, REGENESYS, RC-Clouds, TACID, DARIO, RITAS, AJECT, MICRA, DEAR-COTS, COPE, DEFEATS, MOOSCO, TOPCOM, RE:DY, NORTH, SUPERCLOUD, COST Action IC1402, SEGRID, BioBankCloud, PROPHECY, SAPIENT, SecFuNet, FTH-Grid, AIR-II, AIR, ESFORS, CaberNet, GODC, BROADCAST, CoDiCom, Delta-4, RAPTOR|