The Timely Computing Base Model and
Architecture
P.Veríssimo and
A. Casimiro
IEEE Transactions on Computers - Special Issue on Asynchronous
Real-Time Systems, vol. 51, n. 8, Aug 2002
Current systems are
very often based on large-scale, unpredictable and unreliable
infrastructures. However, users of these systems increasingly require
services with timeliness properties. This creates a difficult-to-solve
contradiction with regard to the adequate time model: synchronous, or
asynchronous? In this paper, we propose an architectural construct and
programming model, which address this problem. We assume the existence
of a component that is capable of executing timely functions, however
asynchronous the rest of the system may be. We call this component the
Timely Computing Base, and it can be used by the other components to
execute a set of simple but crucial time-related services .We also
show how to use it to build dependable and timely applications
exhibiting varying degrees of timeliness assurance, under several
synchrony models.
|
|
|
|
|
Download PDF |
|
Download Postscript |
|
Bibtex Entry |
How to Build a Timely Computing Base using
Real-Time Linux
António Casimiro and Pedro Martins and Paulo Veríssimo
Proceedings of the 2000 IEEE International Workshop on Factory
Communication Systems, Porto, Portugal, September 2000
In a recent paper we introduced a new model
to deal with the problem of handling application timeliness
requirements in environments with loose real-time guarantees. This
model, called the Timely Computing Base (TCB), is one of partial
synchrony. From an engineering point of view, it requires systems to
be constructed with a small control part, a TCB module, to protect
vital resources with respect to timeliness and to provide basic time
related services to applications. Although many different
instantiations of systems with a TCB can be envisaged, we have chosen
to implement a TCB using PC hardware running the Real-Time Linux
operating system over a Fast-Ethernet network. This paper describes
the experience gained during the implementation process and shows that
it is possible to construct a TCB without the need for special
software or hardware components. The problem of achieving real-time
communication under RT-Linux is also discussed: we describe the port
we have done of a Linux network driver to RT-Linux, explaining the
required modifications to allow predictability.
|
|
|
|
|
Download PDF |
|
Download Postscript |
|
Bibtex Entry |
The Timely Computing
Base: Timely Actions in the Presence of Uncertain Timeliness
Paulo Veríssimo, António Casimiro and Christof Fetzer
Proceedings of the International Conference on Dependable Systems
and Networks, New York, USA, June 2000
Real-time behavior is specified in
compliance with timeliness requirements, which in essence calls
for synchronous system models. However, systems often rely on
unpredictable and unreliable infrastructures, that suggest the use of
asynchronous models. Several models have been proposed to address this
issue. We propose an architectural construct that takes a generic
approach to the problem of programming in the presence of uncertain
timeliness. We assume the existence of a component, capable of
executing timely functions, which helps applications with varying
degrees of synchrony to behave reliably despite the occurrence of
timing failures. We call this component the Timely Computing Base, TCB.
This paper describes the TCB architecture and model, and discusses the
application programming interface for accessing the TCB services. The
implementation of the TCB services uses fail-awareness techniques to
increase the coverage of TCB properties.
|
|
|
|
|
Download PDF |
|
Download Postscript |
|
Bibtex Entry |
Timing Failure
Detection with a Timely Computing Base
António Casimiro and Paulo Veríssimo
Third European Research Seminar on Advances in Distributed Systems,
Madeira Island, Portugal, April 1999
In a recent report we proposed an
architectural construct to address the problem of dealing with timeliness
specifications in a generic way. We called it the Timely Computing
Base, TCB. The TCB defines a set of services available to
applications, including timely execution, duration measurement and
timing failure detection. We showed how these services could be used
to build dependable and timely applications. In this paper we further
extend the description of the TCB, namely by presenting a protocol for
its Timing Failure Detection (TFD) service. We discuss the essential
aspects of providing such a service under the TCB framework and make
some considerations relative to the service interface.
|
|
|
|
|
Download PDF |
|
Download Postscript |
|
Bibtex Entry |
The Design of
a COTS Real-Time Distributed Security Kernel
M. Correia, P. Veríssimo, Nuno F. Neves
Fourth European Dependable Computing Conference. Toulouse, France,
pages 234--252, October 2002
This paper describes
the design of a security kernel called TTCB, which has innovative
features. Firstly, it is a distributed subsystem with its own secure
network. Secondly, the TTCB is real-time, that is, a synchronous
subsystem capable of timely behavior. These two characteristics
together are uncommon in security kernels. Thirdly, the TTCB can be
implemented using only COTS components.
We discuss essentially three things in this paper: (1) The TTCB is a
simple component providing a small set of basic secure services. It
aims at building a new style of protocols to achieve intrusion
tolerance, which for the most part execute in insecure, arbitrary
failure environments, and resort to the TTCB only in crucial parts of
their operation. (2) Besides, the TTCB is a synchronous device
supplying functions that may be an enabler of a new generation of
timed secure protocols, until now known to be fragile due to attacks
on timing assumptions. (3) Finally, we present a design methodology
that establishes our hybrid failure assumptions in a well-founded
manner. It helps us to achieve a robust design, despite using
exclusively COTS components, with the advantage of allowing the
security kernel to be easily deployed on widely used platforms.
|
|
|
|
|
|
|
Download Postscript |
|
Bibtex Entry |
Uncertainty and
Predictability: Can they be reconciled?
Paulo Veríssimo
Future Directions in Distributed Computing, pp. 108-113, Springer
Verlag LNCS 2584, May, 2003
We are faced today
with the confluence of antagonistic aims, when designing and deploying
distributed applications,
such as uncertainty and predictability. Uncertainty is a common
denominator of current systems: uncertain synchrony,
fault model, and even topology. However, systems are required to
fulfil more and more demanding goals which require predictability
under several forms, e.g., timeliness, trustworthiness. This paper
introduces a new design philosophy for distributed systems, based on
the existence of architectural constructs with privileged properties-
wormholes- which endow systems with the capability of evading the
uncertainty of the environment for certain crucial steps of their
operation where predictability is required. Recently, we have tested
this philosophy by studying and prototyping two incarnations of
distributed systems with wormholes, which we also report here.
|
|
|
|
|
Download PDF |
|
|
|
Bibtex Entry |
Generic Timing Fault
Tolerance using a Timely Computing Base
A. Casimiro and P.Veríssimo
Proceedings of the
International Conference on Dependable Systems and Networks,
Washington D.C., USA, June 2002
Designing applications with
timeliness requirements in environments of uncertain synchrony is
known to be a difficult problem. In this paper, we follow the
perspective of timing fault tolerance: timing errors occur, and they
are processed using redundancy, e.g., component replication, to
recover and deliver timely service. We introduce a paradigm for
generic timing fault tolerance with replicated state machines. The
paradigm is based on the existence of Timing Failure Detection with
timed completeness and accuracy properties. Generic timing fault
tolerance implies the ability to dependably observe the system and to
timely notify timing failures, which we discuss in the paper. On the
other hand, it ensures replica determinism with respect to time
(temporal consistency), and safety in case of spare exhaustion. We
show that the paradigm can be addressed and realized in the framework
of the Timely Computing Base (TCB) model and architecture.
Furthermore, we illustrate the generality of our approach by reviewing
previous existing solutions and by showing that in contrast with ours,
they only secure a restricted semantics, or simply provide ad-hoc
solutions.
|
|
|
|
|
Download PDF |
|
Download Postscript |
|
Bibtex Entry |
Using the Timely Computing Base for Dependable
QoS Adaptation
A. Casimiro and P.Veríssimo
Proceedings of the 20th IEEE Symposium on Reliable
Distributed Systems , New Orleans, USA, October 2001
In open and heterogeneous
environments, where an unpredictable number of applications compete
for a limited amount of resources, executions can be affected by also
unpredictable delays, which may not even be bounded. Since many of
these applications have timeliness requirements, they can only be
implemented if they are able to adapt to the existing conditions.
Adaptation can be done by several ways, taking into account many
different factors, but an obvious factor of success is knowing what
they have to adapt to. In this paper we present a novel approach,
called Dependable QoS adaptation, which can only be achieved if the
environment is accurately and reliably observed.
Dependable QoS adaptation is based
on the Timely Computing Base (TCB) model. The TCB model is a partial
synchrony model that adequately characterizes environments of
uncertain synchrony and allows, at the same time, the specification
and verification of timeliness requirements. We introduce the coverage
stability property and show that adaptive applications can use the TCB
to dependably adapt and enjoy this property. We describe the
characteristics and the interface of a QoS coverage service and
discuss its implementation details.
|
|
|
|
|
Download PDF |
|
Download Postscript |
|
Bibtex Entry |
A Simple
Intrusion-Tolerant Reliable Multicast Protocol using the TTCB Model
Miguel Correia, Lau Cheuk Lung, Nuno Ferreira Neves, Paulo Veríssimo
Proceedings of the 21st Simpósio Brasileiro de Redes de
Computadores, Natal, Brasil, May 2003
This paper proposes a
simple reliable multicast protocol that tolerates arbitrary faults,
including malicious faults such as intrusions. The goal is to show a
novel way of designing intrusion-tolerant protocols based on a
wellfounded hybrid fault model. This model is based on a simple
distributed security kernel the TTCB which is used by the processes
only to execute securely critical steps of the protocol. Otherwise,
the processes and their communication can be attacked in unlimited
ways. The TTCB provides only a few basic
services, which allow our protocol to tolerate a number of faults
similar to accidental fault-tolerant protocols: for f faults, our
protocol requires f + 2 processes, instead of 3f + 1 in typical
intrusion-tolerant (or Byzantine) protocols. The protocol exhibits
fast termination in the presence of intrusions and/or crash or
malicious process failures, since it does not use any cryptography in
runtime.
|
|
|
|
|
Download PDF |
|
|
|
Bibtex Entry |
Efficient
Byzantine-Resilient Reliable Multicast on a Hybrid Failure Model
M. Correia and L. C. Lung and
N. F. Neves and P. Veríssimo
21th IEEE Symposium on Reliable Distributed Systems. Suita, Japan,
pages 2--11, October 2002
The paper presents a
new reliable multicast protocol that tolerates arbitrary faults,
including Byzantine faults. This protocol is developed using a novel
way of designing secure protocols which is based on a well-founded
hybrid failure model. Despite our claim of arbitrary failure
resilience, the protocol
needs not necessarily incur the cost of ``Byzantine agreement'', in
number of participants and round/message complexity. It can rely on
the existence of a simple distributed security kernel -- the TTCB --
where the participants only execute crucial parts of the protocol
operation, under the protection of a crash failure model. Otherwise,
participants follow an arbitrary failure model.
The TTCB provides only a few basic services, which allow our protocol
to have an efficiency similar to that of accidental fault-tolerant
protocols: for f faults, our protocol requires f+2 processes, instead
of 3f+1 in Byzantine systems. Besides, the TTCB (which is synchronous)
allows secure operation of timed protocols, despite the unpredictable
time behavior of the environment (possibly due to attacks on timing
assumptions).
|
|
|
|
|
Download PostScript |
|
|
|
Bibtex Entry |
Measuring
Distributed Durations with Stable Errors
António Casimiro, Pedro Martins, Paulo Veríssimo and Luis Rodrigues
Proceedings of the 22nd IEEE Real-Time Systems Symposium, London, UK,
December 2001
The round-trip duration measurement technique
is fundamental to solve many problems in asynchronous distributed systems. In
essence, this technique provides the means for reading remote clocks with a
known and bounded error. Therefore, it is used as a fundamental building block
in several clock synchronization algorithms. In general, the technique can be
used to implement duration measurement services, such as the one of the Timely
Computing Base model. In this paper we propose a new technique to measure
distributed durations that minimizes the measurement error and is able to keep
this error almost stable. The new technique can be used to improve the
precision of remote clock reading in certain situations. We provide a protocol
that implements this new technique and we present some evaluation results. The
results clearly show that our solution is indeed better than existing ones.
|
|
|
|
|
Download PDF |
|
Download Postscript |
|
Bibtex Entry |
Event Timestamping
Tool: a simple PC based kernel to timestamp distributed events
Pedro Martins and António Casimiro
Technical Report DI/FCUL TR-00-4, Department of Informatics,
University of Lisboa, July 2000
This report describes the design and
implementation of a tool to timestamp distributed events, using a
standard PC hardware platform. The Event Timestamping Tool (ETT) is a
small software kernel that detects externally generated events using
two probe sources, and stores the respective timestamps with known
precision bounds. A specialized kernel solution minimizes the response
time for an event detection and registration and, consequently,
maximizes the precision of the tool. Our approach exploits the Pentium
µprocessor internal timestamp counter to provide timestamps with fine
granularity.
|
|
|
|
|
Download PDF |
|
Download Postscript |
|
Bibtex Entry |
Intrusion-Tolerant
Architectures: Concepts and Design
Paulo Veríssimo and Nuno F.
Neves and Miguel Correia
Architecting Dependable Systems, pp. 3-36, Springer-Verlag LNCS
2677, 2003
There is a
significant body of research on distributed computing architectures,
methodologies and algorithms, both in the fields of fault tolerance
and security. Whilst they have taken separate paths until recently,
the problems to be solved are of similar nature. In classical
dependability, fault tolerance has been the workhorse of many
solutions. Classical security-related work has on the other hand
privileged, with few exceptions, intrusion prevention. Intrusion
tolerance (IT) is a new approach that has slowly emerged during the
past decade, and gained impressive momentum recently. Instead of
trying to prevent every single intrusion, these are allowed, but
tolerated: the system triggers mechanisms that prevent the intrusion
from generating a system security failure. The paper describes the
fundamental concepts behind IT, tracing their connection with
classical fault tolerance and security. We discuss the main strategies
and mechanisms for architecting IT systems, and report on recent
advances on distributed IT system architectures.
|
|
|
|
|
Download PDF |
|
|
|
Bibtex Entry |
|