“Detection of Vulnerabilities and Automatic Protection for Web Applications”

From Navigators

Ibéria Medeiros (advised by Miguel Correia, Nuno Ferreira Neves)

Ph.D. dissertation, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Sept. 2016

Abstract: In less than three decades of existence, the Web evolved from a platform for accessing hypermedia to a framework for running complex web applications. These applications appear in many forms, from small home-made to large-scale commercial services such as Gmail, Office 365, and Facebook. Although a significant research effort on web application security has been on going for a while, these applications have been a major source of problems and their security continues to be challenged. An important part of the problem derives from vulnerable source code, often written in unsafe languages like PHP, and programmed by people without the appropriate knowledge about secure coding, who leave flaws in the applications. Nowadays the most exploited vulnerability category is the input validation, which is directly related with the user inputs inserted in web application forms. The thesis proposes methodologies and tools for the detection of input validation vulnerabilities in source code and for the protection of web applications written in PHP, using source code static analysis, machine learning and runtime protection techniques. An approach based on source code static analysis is used to identify vulnerabilities in applications programmed with PHP. The user inputs are tracked with taint analysis to determine if they reach a PHP function susceptible to be exploited. Then, machine learning is applied to determine if the identified flaws are actually vulnerabilities. In the affirmative case, the results of static analysis are used to remove the flaws, correcting the source code automatically thus protecting the web application. A new technique for source code static analysis is suggested to automatically learn about vulnerabilities and then to detect them. Machine learning applied to natural language processing is used to, in a first instance, learn characteristics about flaws in the source code, classifying it as being vulnerable or not, and then discovering and identifying the vulnerabilities. A runtime protection technique is also proposed to flag and block injection attacks against databases. The technique is implemented inside the database management system to improve the effectiveness of the detection of attacks, avoiding a semantic mismatch. Source code identifiers are employed so that, when an attack is flagged, the vulnerability is localized in the source code. Overall this work allowed the identification of about 1200 vulnerabilities in open source web applications available in the Internet, 560 of which previously unknown. The unknown vulnerabilities were reported to the corresponding software developers and most of them have already been removed.