Abstract
|
Although a large research effort has been … Although a large research effort has been going
on for more than a decade, the security of web applications
continues to be a challenging problem. An important part of
that problem derives from vulnerable source code, often written
in unsafe languages like PHP. Source code static analysis tools
are a solution to find vulnerabilities, but they tend to generate
false positives and require considerable effort for programmers
to manually fix the code. We explore the use of a combination
of methods to discover vulnerabilities in source code with less
false positives. We combine taint analysis, which finds candidate
vulnerabilities, with data mining, in order to predict the existence
of false positives. This approach brings together two approaches
that are apparently orthogonal: humans coding the knowledge
about vulnerabilities (for taint analysis) versus automatically
obtaining that knowledge (with machine learning, for data
mining). Given this enhanced form of detection, we propose
doing automatic code correction by inserting fixes in the source
code. Our approach was implemented in the WAP tool and an
experimental evaluation was performed with a large set of PHP
applications. Our tool found 388 vulnerabilities in 1.4 million
lines of code. Its accuracy and precision were approximately 5%
better than PhpMinerII’s and 45% better than Pixy’s. n PhpMinerII’s and 45% better than Pixy’s.
|