COPYRIGHT NOTICE: Reports contained in this page are included by the contributing authors as a mechanism to ensure timely dissemination of scholarly/technical information on a non-commerical basis. Copyright and all rights therein are maintained by the authors, despite the fact they have offered this information electronically. It is understood that all individuals copying this information will adhere to the terms/constraints invoked by each author's copyright.
Reports may not be copied for commercial redistribution, republication, or dissemination without the explicit permission of the Navigators and the authors.
Sections of some of these reports have been published by IEEE and have IEEE Copyright. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: + Intl. 732-562-3966.

Generic Timing Fault Tolerance using a Timely Computing Base
A. Casimiro and P.Veríssimo
Proceedings of the International Conference on Dependable Systems and Networks, Washington D.C., USA, June 2002

Designing applications with timeliness requirements in environments of uncertain synchrony is known to be a difficult problem. In this paper, we follow the perspective of timing fault tolerance: timing errors occur, and they are processed using redundancy, e.g., component replication, to recover and deliver timely service. We introduce a paradigm for generic timing fault tolerance with replicated state machines. The paradigm is based on the existence of Timing Failure Detection with timed completeness and accuracy properties. Generic timing fault tolerance implies the ability to dependably observe the system and to timely notify timing failures, which we discuss in the paper. On the other hand, it ensures replica determinism with respect to time (temporal consistency), and safety in case of spare exhaustion. We show that the paradigm can be addressed and realized in the framework of the Timely Computing Base (TCB) model and architecture. Furthermore, we illustrate the generality of our approach by reviewing previous existing solutions and by showing that in contrast with ours, they only secure a restricted semantics, or simply provide ad-hoc solutions.

Download PDF Download Postscript Bibtex Entry

Using the Timely Computing Base for Dependable QoS Adaptation
A. Casimiro and P.Veríssimo
Proceedings of the 20th IEEE Symposium on Reliable Distributed Systems , New Orleans, USA, October 2001

In open and heterogeneous environments, where an unpredictable number of applications compete for a limited amount of resources, executions can be affected by also unpredictable delays, which may not even be bounded. Since many of these applications have timeliness requirements, they can only be implemented if they are able to adapt to the existing conditions. Adaptation can be done by several ways, taking into account many different factors, but an obvious factor of success is knowing what they have to adapt to. In this paper we present a novel approach, called Dependable QoS adaptation, which can only be achieved if the environment is accurately and reliably observed.

Dependable QoS adaptation is based on the Timely Computing Base (TCB) model. The TCB model is a partial synchrony model that adequately characterizes environments of uncertain synchrony and allows, at the same time, the specification and verification of timeliness requirements. We introduce the coverage stability property and show that adaptive applications can use the TCB to dependably adapt and enjoy this property. We describe the characteristics and the interface of a QoS coverage service and discuss its implementation details.

Download PDF Download Postscript Bibtex Entry

Intrusion-tolerance applications

A Simple Intrusion-Tolerant Reliable Multicast Protocol using the TTCB Model
Miguel Correia, Lau Cheuk Lung, Nuno Ferreira Neves, Paulo Veríssimo
Proceedings of the 21st Simpósio Brasileiro de Redes de Computadores, Natal, Brasil, May 2003

This paper proposes a simple reliable multicast protocol that tolerates arbitrary faults, including malicious faults such as intrusions. The goal is to show a novel way of designing intrusion-tolerant protocols based on a wellfounded hybrid fault model. This model is based on a simple distributed security kernel the TTCB which is used by the processes only to execute securely critical steps of the protocol. Otherwise, the processes and their communication can be attacked in unlimited ways. The TTCB provides only a few basic
services, which allow our protocol to tolerate a number of faults similar to accidental fault-tolerant protocols: for f faults, our protocol requires f + 2 processes, instead of 3f + 1 in typical intrusion-tolerant (or Byzantine) protocols. The protocol exhibits fast termination in the presence of intrusions and/or crash or malicious process failures, since it does not use any cryptography in runtime.

Download PDF Bibtex Entry

Efficient Byzantine-Resilient Reliable Multicast on a Hybrid Failure Model
M. Correia and L. C. Lung and N. F. Neves and P. Veríssimo
21th IEEE Symposium on Reliable Distributed Systems. Suita, Japan, pages 2--11, October 2002

The paper presents a new reliable multicast protocol that tolerates arbitrary faults, including Byzantine faults. This protocol is developed using a novel way of designing secure protocols which is based on a well-founded hybrid failure model. Despite our claim of arbitrary failure resilience, the protocol
needs not necessarily incur the cost of ``Byzantine agreement'', in number of participants and round/message complexity. It can rely on the existence of a simple distributed security kernel -- the TTCB -- where the participants only execute crucial parts of the protocol operation, under the protection of a crash failure model. Otherwise, participants follow an arbitrary failure model.
The TTCB provides only a few basic services, which allow our protocol to have an efficiency similar to that of accidental fault-tolerant protocols: for f faults, our protocol requires f+2 processes, instead of 3f+1 in Byzantine systems. Besides, the TTCB (which is synchronous) allows secure operation of timed protocols, despite the unpredictable time behavior of the environment (possibly due to attacks on timing assumptions).

Download PostScript Bibtex Entry

Others

Measuring Distributed Durations with Stable Errors
António Casimiro, Pedro Martins, Paulo Veríssimo and Luis Rodrigues
Proceedings of the 22nd IEEE Real-Time Systems Symposium, London, UK, December 2001

The round-trip duration measurement technique is fundamental to solve many problems in asynchronous distributed systems. In essence, this technique provides the means for reading remote clocks with a known and bounded error. Therefore, it is used as a fundamental building block in several clock synchronization algorithms. In general, the technique can be used to implement duration measurement services, such as the one of the Timely Computing Base model. In this paper we propose a new technique to measure distributed durations that minimizes the measurement error and is able to keep this error almost stable. The new technique can be used to improve the precision of remote clock reading in certain situations. We provide a protocol that implements this new technique and we present some evaluation results. The results clearly show that our solution is indeed better than existing ones.

Download PDF Download Postscript Bibtex Entry

Event Timestamping Tool: a simple PC based kernel to timestamp distributed events
Pedro Martins and António Casimiro
Technical Report DI/FCUL TR-00-4, Department of Informatics, University of Lisboa, July 2000

This report describes the design and implementation of a tool to timestamp distributed events, using a standard PC hardware platform. The Event Timestamping Tool (ETT) is a small software kernel that detects externally generated events using two probe sources, and stores the respective timestamps with known precision bounds. A specialized kernel solution minimizes the response time for an event detection and registration and, consequently, maximizes the precision of the tool. Our approach exploits the Pentium µprocessor internal timestamp counter to provide timestamps with fine granularity.

Download PDF Download Postscript Bibtex Entry

Intrusion-Tolerant Architectures: Concepts and Design
Paulo Veríssimo and Nuno F. Neves and Miguel Correia
Architecting Dependable Systems, pp. 3-36, Springer-Verlag LNCS 2677, 2003

There is a significant body of research on distributed computing architectures, methodologies and algorithms, both in the fields of fault tolerance and security. Whilst they have taken separate paths until recently, the problems to be solved are of similar nature. In classical dependability, fault tolerance has been the workhorse of many solutions. Classical security-related work has on the other hand privileged, with few exceptions, intrusion prevention. Intrusion tolerance (IT) is a new approach that has slowly emerged during the past decade, and gained impressive momentum recently. Instead of trying to prevent every single intrusion, these are allowed, but tolerated: the system triggers mechanisms that prevent the intrusion from generating a system security failure. The paper describes the fundamental concepts behind IT, tracing their connection with classical fault tolerance and security. We discuss the main strategies and mechanisms for architecting IT systems, and report on recent advances on distributed IT system architectures.

Download PDF Bibtex Entry

(T)TCB model, architecture and implementation

Time-related Applications

Intrusion-tolerance applications

Others