“On the Efficiency of Durable State Machine Replication”
in Proceedings of the 2013 USENIX Annual Technical Conference, Jun. 2013, pp. 169–180.
Abstract: State Machine Replication (SMR) is a fundamental tech- nique for ensuring the dependability of critical services in modern internet-scale infrastructures. SMR alone does not protect from full crashes, and thus in practice it is employed together with secondary storage to ensure the durability of the data managed by these services. In this work we show that the classical durability enforc- ing mechanisms – logging, checkpointing, state transfer – can have a high impact on the performance of SMR- based services even if SSDs are used instead of disks. To alleviate this impact, we propose three techniques that can be used in a transparent manner, i.e., without modi- fying the SMR programming model or requiring extra re- sources: parallel logging, sequential checkpointing, and collaborative state transfer. We show the benefits of these techniques experimentally by implementing them in an open-source replication library, and evaluating them in the context of a consistent key-value store and a coordi- nation service.
Research line(s): Fault and Intrusion Tolerance in Open Distributed Systems (FIT)