“Proactive Resilience”

From Navigators

Revision as of 16:37, 2 October 2018 by Nuno (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Paulo Sousa (advised by Paulo Verissimo, Nuno Ferreira Neves)

Ph.D. dissertation, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, May 2007

Abstract: This thesis introduces a new dimension over which systems dependability may be evaluated, exhaustion-safety. Exhaustion-safety means safety against resource exhaustion, and its concrete semantics in a given system depends on the type of resource being considered. The thesis focuses on the nodes of a fault-tolerant distributed system as crucial resources and on understanding the conditions in which the typical assumption on the maximum number of node failures may or may not be violated. An interesting first finding was that it is impossible to build a node exhaustion-safe intrusion-tolerant distributed system under the asynchronous model. This result motivated the research on developing the right model and architecture to guarantee node-exhaustion safety. The main outcome of this research was proactive resilience, a new paradigm to build intrusion-tolerant distributed systems. Proactive resilience is based on architectural hybridization and hybrid distributed system modeling: the system is asynchronous in its most part and it resorts to a synchronous subsystem to periodically recover the nodes and remove the effects of faults/attacks. The Proactive Resilience Model (PRM) is presented and shown to be a way of building node-exhaustion-safe intrusion-tolerant distributed systems. Finally, the thesis presents two application scenarios of proactive resilience. First, a proof-of-concept prototype of a secret sharing system built according to the PRM is described and shown to be highly resilient under different attack scenarios. Then, a novel intrusion tolerant state machine replication architecture (based on the PRM) is presented and a new result established, that a minimum of 3 f + 2k+1 replicas are required to ensure availability, on a system where f arbitrary faults may happen between recoveries, with at most k replicas recovering simultaneously.

Export citation



Research line(s): Fault and Intrusion Tolerance in Open Distributed Systems (FIT)

Personal tools
Navigators toolbox