Browse wiki

From Navigators

Jump to: navigation, search
Publication:2017 PCosta TesePhD
Abstract MapReduce is a simple and elegant programm MapReduce is a simple and elegant programming model suitable for loosely coupled parallelization problems—problems that can be decomposed into subproblems. Hadoop MapReduce has become the most popular framework for performing large-scale computation on off-the-shelf clusters, and it is widely used to process these problems in a parallel and distributed fashion. This framework is highly scalable, can deal efficiently with large volumes of unstructured data, and it is a platform for many other applications. However, the framework has limitations concerning dependability. Namely, it is solely prepared to tolerate crash faults by re-executing tasks in case of failure, and to detect file corruptions using file checksums. Unfortunately, there is evidence that arbitrary faults do occur and can affect the correctness of MapReduce execution. Although such Byzantine faults are considered to be rare, particular MapReduce applications are critical and intolerant to this type of fault. Furthermore, typical MapReduce implementations are constrained to a single cloud environment. This is a problem as there is increasing evidence of outages on major cloud offerings, raising concerns about the dependence on a single cloud. In this thesis, we propose techniques to improve the dependability of MapReduce systems. The proposed solutions allow MapReduce to scale out computations to a multi-cloud environment, or cloud-of-clouds, to tolerate arbitrary and malicious faults and cloud outages. Our proposals have three important properties: they increase the dependability of MapReduce by tolerating the faults mentioned above; they require minimal or no modifications to users’ applications; and they achieve this increased level of fault tolerance at reasonable cost. To achieve these goals, we introduce three key ideas: minimizing the required replication; applying context-based job scheduling based on cloud and network conditions; and performing fine-grained replication. We evaluated all proposed solutions in real testbed environments running typical MapReduce applications. Our results demonstrate interesting trade-offs concerning resilience and performance when compared to traditional methods. The fundamental conclusion is that the cost introduced by our solutions is small, and thus deemed acceptable for many critical applications. acceptable for many critical applications.
Advisor Fernando Ramos + , Miguel Correia +
Author Pedro Costa +
Document Document for Publication-2017 PCosta TesePhD.pdf +
Key 2017 PCosta TesePhD  +
Month nov  +
NumPubDate 2,017.11  +
Project Project:SUPERCLOUD +
ResearchLine Fault and Intrusion Tolerance in Open Distributed Systems (FIT) +
School Doutoramento em Informática, Faculdade de Ciências da Universidade de Lisboa  +
Title Dependable MapReduce in a Cloud-of-Clouds  +
Type phdthesis  +
Year 2017  +
Has improper value forThis property is a special property in this wiki. Url  +
Categories Publication  +
Modification dateThis property is a special property in this wiki. 24 February 2018 19:20:28  +
hide properties that link here 
  No properties link to this page.


Enter the name of the page to start browsing from.
Personal tools
Navigators toolbox