Browse wiki

From Navigators

Publication:2017 PCosta TesePhD

Abstract	MapReduce is a simple and elegant programm … MapReduce is a simple and elegant programming model suitable for loosely coupled parallelization problems—problems that can be decomposed into subproblems. Hadoop MapReduce has become the most popular framework for performing large-scale computation on off-the-shelf clusters, and it is widely used to process these problems in a parallel and distributed fashion. This framework is highly scalable, can deal efficiently with large volumes of unstructured data, and it is a platform for many other applications. However, the framework has limitations concerning dependability. Namely, it is solely prepared to tolerate crash faults by re-executing tasks in case of failure, and to detect file corruptions using file checksums. Unfortunately, there is evidence that arbitrary faults do occur and can affect the correctness of MapReduce execution. Although such Byzantine faults are considered to be rare, particular MapReduce applications are critical and intolerant to this type of fault. Furthermore, typical MapReduce implementations are constrained to a single cloud environment. This is a problem as there is increasing evidence of outages on major cloud offerings, raising concerns about the dependence on a single cloud. In this thesis, we propose techniques to improve the dependability of MapReduce systems. The proposed solutions allow MapReduce to scale out computations to a multi-cloud environment, or cloud-of-clouds, to tolerate arbitrary and malicious faults and cloud outages. Our proposals have three important properties: they increase the dependability of MapReduce by tolerating the faults mentioned above; they require minimal or no modifications to users’ applications; and they achieve this increased level of fault tolerance at reasonable cost. To achieve these goals, we introduce three key ideas: minimizing the required replication; applying context-based job scheduling based on cloud and network conditions; and performing fine-grained replication. We evaluated all proposed solutions in real testbed environments running typical MapReduce applications. Our results demonstrate interesting trade-offs concerning resilience and performance when compared to traditional methods. The fundamental conclusion is that the cost introduced by our solutions is small, and thus deemed acceptable for many critical applications. acceptable for many critical applications.
Advisor	Fernando Ramos + , Miguel Correia +
Author	Pedro Costa +
Document	Document for Publication-2017 PCosta TesePhD.pdf +
Key	2017 PCosta TesePhD +
Month	nov +
NumPubDate	2,017.11 +
Project	Project:SUPERCLOUD +
ResearchLine	Fault and Intrusion Tolerance in Open Distributed Systems (FIT) +
School	Doutoramento em Informática, Faculdade de Ciências da Universidade de Lisboa +
Title	Dependable MapReduce in a Cloud-of-Clouds +
Type	phdthesis +
Year	2017 +
Has improper value forThis property is a special property in this wiki.	Url +
Categories	Publication +
Modification dateThis property is a special property in this wiki.	24 February 2018 19:20:28 +

hide properties that link here

	No properties link to this page.

Browse wiki

From Navigators

Views

Personal tools

Navigators

Search

Toolbox

Navigators toolbox

Navigation