“SPARE: Replicas on Hold”
in Proceedings of the 18th Annual Network & Distributed System Security Symposium, Feb. 2011.
Abstract: Despite numerous improvements in the development and maintenance of software, bugs and security holes exist in today’s products, and malicious intrusions happen frequently. While this is a general problem, it explicitly applies to webbased services. However, Byzantine fault-tolerant (BFT) replication and proactive recovery offer a powerful combination to tolerate and overcome these kinds of faults, thereby enabling long-term service provision. BFT replication is commonly associated with the overhead of 3f + 1 replicas to handle f faults. Using a trusted component, some previous systems were able to reduce the resource cost to 2f +1 replicas. In general, adding support for proactive recovery further increases the resource demand. We believe this enormous resource demand is one of the key reasons why BFT replication is not commonly applied and considered unsuitable for web-based services. In this paper we present SPARE, a cloud-aware approach that harnesses virtualization to reduce the resource demand of BFT replication and to provide efficient support for proactive recovery. In SPARE, we focus on the main source of software bugs and intrusions; that is, the services and their associated execution environments. This approach enables us to restrict replication and request execution to only f + 1 replicas in the fault-free case while rapidly activating up to f additional replicas by utilizing virtualization in case of timing violations and faults. For an instant reaction, we keep spare replicas that are periodically updated in a paused state. In the fault-free case, these passive replicas require far less resources than active replicas and aid efficient proactive recovery.
Research line(s): Fault And Intrusion Tolerance in Open Distributed Systems (FIT)