In less than three decades of existence, t … In less than three decades of existence, the Web evolved from a platform for
accessing hypermedia to a framework for running complex web applications.
These applications appear in many forms, from small home-made to large-scale
commercial services such as Gmail, Office 365, and Facebook. Although a significant
research effort on web application security has been on going for a while,
these applications have been a major source of problems and their security continues
to be challenged. An important part of the problem derives from vulnerable
source code, often written in unsafe languages like PHP, and programmed
by people without the appropriate knowledge about secure coding, who leave
flaws in the applications. Nowadays the most exploited vulnerability category
is the input validation, which is directly related with the user inputs inserted in
web application forms.
The thesis proposes methodologies and tools for the detection of input validation
vulnerabilities in source code and for the protection of web applications
written in PHP, using source code static analysis, machine learning and runtime
An approach based on source code static analysis is used to identify vulnerabilities
in applications programmed with PHP. The user inputs are tracked with taint
analysis to determine if they reach a PHP function susceptible to be exploited.
Then, machine learning is applied to determine if the identified flaws are actually
vulnerabilities. In the affirmative case, the results of static analysis are used
to remove the flaws, correcting the source code automatically thus protecting the
A new technique for source code static analysis is suggested to automatically
learn about vulnerabilities and then to detect them. Machine learning applied to
natural language processing is used to, in a first instance, learn characteristics
about flaws in the source code, classifying it as being vulnerable or not, and then
discovering and identifying the vulnerabilities.
A runtime protection technique is also proposed to flag and block injection attacks
against databases. The technique is implemented inside the database management
system to improve the effectiveness of the detection of attacks, avoiding
a semantic mismatch. Source code identifiers are employed so that, when an
attack is flagged, the vulnerability is localized in the source code.
Overall this work allowed the identification of about 1200 vulnerabilities in open
source web applications available in the Internet, 560 of which previously unknown.
The unknown vulnerabilities were reported to the corresponding software
developers and most of them have already been removed. nd most of them have already been removed.