“Consistent and Fault-Tolerant SDN Controller”

From Navigators

André Mantas (advised by Fernando Ramos)

Master’s thesis, Mestrado em Engenharia Informática, Departamento de Informática, Faculdade de Ciências da Universidade de Lisboa, Nov. 2016

Abstract: The concept of Software-Defined Networking (SDN) breaks the coupling in traditional networks between the control and data planes. This decoupling allows the development of innovative and flexible applications to manage, monitor, and program the network. In SDN, applications use a logically centralized network view, provided by the controllers, to remotely program switches in the network. If this network view is not consistent with the actual state of the network, applications will operate on a stale state and produce incorrect outputs. This can significantly degrade the network performance (e.g., packets can be forwarded through a failed link) and create problems such as network loops or security failures. As in any system in production, component failures must be the rule and not the exception. Thus, it is important that the control plane is able to maintain a consistent network view in the presence of failures in both controllers and links. Therefore, the network view must be consistently replicated across several controller replicas so that the failure of one controller does not compromise the entire control plane. Additionally, to ensure a correct system, the state maintained by switches must be handled in a consistent way, which is particularly difficult in the presence of failures. This work proposes a resilient SDN control plane that allows modification-free applications to run in a fault-tolerant and consistent environment (in both controllers and switches state). To achieve the fault-tolerant environment, controllers must replicate (transparently) the received events among themselves before these are delivered to the network applications. For the consistent environment, the main idea is that controllers process control messages transactionally, exactly once, achieving a correct and consistent operation of both controllers and switches, even if some of them fail. The two main techniques used are instructing switches to send events to all controllers, which coordinate to ensure exactly once event processing, and using OpenFlow Bundles as the acknowledgement mechanism used by controllers to process commands on switches exactly once. The differentiating factor and novelty over existing works is the fact that the consistency guarantees our proposal assures, in a fault-tolerant SDN environment, do not require modifications to neither switches nor to the OpenFlow protocol.