Generic Timing Fault
Tolerance using a Timely Computing Base
A. Casimiro and P.VerĂssimo
To appear in Proceedings of the
International Conference on Dependable Systems and Networks,
Washington D.C., USA, June 2002
Designing applications with
timeliness requirements in environments of uncertain synchrony is
known to be a difficult problem. In this paper, we follow the
perspective of timing fault tolerance: timing errors occur, and they
are processed using redundancy, e.g., component replication, to
recover and deliver timely service. We introduce a paradigm for
generic timing fault tolerance with replicated state machines. The
paradigm is based on the existence of Timing Failure Detection with
timed completeness and accuracy properties. Generic timing fault
tolerance implies the ability to dependably observe the system and to
timely notify timing failures, which we discuss in the paper. On the
other hand, it ensures replica determinism with respect to time
(temporal consistency), and safety in case of spare exhaustion. We
show that the paradigm can be addressed and realized in the framework
of the Timely Computing Base (TCB) model and architecture.
Furthermore, we illustrate the generality of our approach by reviewing
previous existing solutions and by showing that in contrast with ours,
they only secure a restricted semantics, or simply provide ad-hoc
solutions.
 |
|
 |
|
 |
Download PDF |
|
Download Postscript |
|
Bibtex Entry |
|