CloudFIT: Fault and Intrusion Tolerance for Cloud Computing

From Navigators

Jump to: navigation, search


Cloud computing has gained strong popularity in the past years. Cloud architectures typically combine a potentially large number of heterogeneous, loosely coupled and geographically dispersed computers connected via the Internet to form a single unified system that hosts service applications. Cloud architectures make it difficult to apply traditional security approaches. For example, global management policies are difficult to enforce when clouds cross administrative boundaries. At the same time, software complexity is steadily increasing, making it practically infeasible to guarantee the absence of security vulnerabilities in it. As a consequence, implementing dependable services in a cloud faced by malicious attacks is a challenging task. Intrusion tolerance is a paradigm that allows implementing services in a way that they can correctly provide their functionality in spite of malicious intrusions in some of the cloud nodes.

The objective of this project is to define an infrastructure for intrusion-tolerant services in a cloud environment. In order to achieve this goal, we use intrusion-tolerant replication, which allows tolerating intrusions in a subset of the replicas. With the CloudFIT architecture, we address three main scientific challenges, as stated below.

First, virtualisation technology has become mainstream in cloud computing, executing services within virtual machines and managing the cloud resources with the virtualisation infrastructure. Virtualisation is also an established approach to create a hybrid system architecture with an intrusion-free trusted domain and application domains that execute services subject to malicious attacks. While most virtualisation approaches for clouds tend to continuously grow in functionality and complexity, virtualisation for implementing a trusted domain needs to be minimal and verifiable, in order to justify the assumption of intrusion-freedom. The goal of CloudFIT is to combine both in a common architecture, analysing the requirements that both have on the virtual machine monitor (VMM) and defining a minimal virtualisation layer with sufficient functionality for intrusion-tolerant applications and for managing cloud resources.

Second, the trusted computing base has to execute the functionality needed for intrusion-tolerant replication and proactive recovery. As the code base executed within the TCB should be minimal, the functionality needs to be split between application domains and trusted domain. The challenge hereby is identifying a subset of the infrastructure for intrusion-tolerant replication and recovery that should be executed in a trusted computing base, and the definition of adequate interfaces between application domain and TCB, for example for supporting efficient state transfer.

Third, resource allocation in clouds is typically automated, given the resource demands of applications, such as the need for CPU time, disk space, and network capacity. Automatically allocating replicas of an intrusion-tolerant application needs additional criteria that influence intrusion tolerance. For example, replicas should never be placed on the same host, probably not even in the same administrative domain, and should be heterogeneous (e.g., different hardware or operating systems), in order to avoid common-mode faults that would allow an attacker to compromise multiple replicas simultaneously. With the same motivations, recoveries should change replica locations in order to avoid suffering again from the same attack. It is thus essential to define strategies for resource allocation for replicas in the cloud, in order to maximise the availability of a service, and integrate these strategies with the automated resource allocation mechanisms found in cloud infrastructures.

The expected results from the project are:

  1. The definition of a virtualisation architecture that respects the needs of cloud infrastructures and provides a minimal trusted computing base (TCB) for intrusion-tolerant replication with proactive recovery;
  2. The specification of an intrusion-tolerant replication infrastructure for cloud computing, in which the functionality is split between a minimal core executing in the TCB and a second part that is placed within normal application domains faced with malicious faults;
  3. The analysis of the requirements on resource management for intrusion tolerance, such as replica dislocation and diversity;
  4. The extension of a grid/cloud resource allocator in order to incorporate the fault-tolerance requisites;
  5. A prototype that integrates (1)-(4), an evaluation of the performance of this prototype, and an analysis of the improvements that the proposed architecture yields in terms of intrusion tolerance.


  • Vinicius Vielmo Cogo, André Nogueira, João Sousa, Marcelo Pasin, Hans P. Reiser, Alysson Bessani, “FITCH: Supporting Adaptive Replicated Services in the Cloud”, in Proceedings of the 13th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS'13), Jim Dowling, Francois Taïani, Eds., Florence, Italy, Jun. 2013, pp. 15–28.

  • T. Distler, R. Kapitza, Ivan Popov, Hans P. Reiser, W. Schroeder-Preikschat, “SPARE: Replicas on Hold”, in Proceedings of the 18th Annual Network & Distributed System Security Symposium, Feb. 2011.


Navigators - CloudFIT project
Personal tools
Navigators toolbox