Abstract
|
MapReduce is a simple and elegant programm … MapReduce is a simple and elegant programming model suitable for loosely
coupled parallelization problems—problems that can be decomposed into subproblems.
Hadoop MapReduce has become the most popular framework for
performing large-scale computation on off-the-shelf clusters, and it is widely
used to process these problems in a parallel and distributed fashion. This framework
is highly scalable, can deal efficiently with large volumes of unstructured
data, and it is a platform for many other applications. However, the framework
has limitations concerning dependability. Namely, it is solely prepared
to tolerate crash faults by re-executing tasks in case of failure, and to detect file
corruptions using file checksums. Unfortunately, there is evidence that arbitrary
faults do occur and can affect the correctness of MapReduce execution.
Although such Byzantine faults are considered to be rare, particular MapReduce
applications are critical and intolerant to this type of fault. Furthermore,
typical MapReduce implementations are constrained to a single cloud environment.
This is a problem as there is increasing evidence of outages on major
cloud offerings, raising concerns about the dependence on a single cloud.
In this thesis, we propose techniques to improve the dependability of MapReduce
systems. The proposed solutions allow MapReduce to scale out computations
to a multi-cloud environment, or cloud-of-clouds, to tolerate arbitrary
and malicious faults and cloud outages. Our proposals have three important
properties: they increase the dependability of MapReduce by tolerating the
faults mentioned above; they require minimal or no modifications to users’ applications;
and they achieve this increased level of fault tolerance at reasonable
cost. To achieve these goals, we introduce three key ideas: minimizing the required
replication; applying context-based job scheduling based on cloud and
network conditions; and performing fine-grained replication.
We evaluated all proposed solutions in real testbed environments running typical
MapReduce applications. Our results demonstrate interesting trade-offs concerning resilience and performance when compared to traditional methods.
The fundamental conclusion is that the cost introduced by our solutions is
small, and thus deemed acceptable for many critical applications. acceptable for many critical applications.
|