“Detecting Web Vulnerabilities in an Intermediate Language Resorting of Machine Learning Techniques”
From Navigators
(Difference between revisions)
(Created page with "{{Publication |type=mastersthesis |title=Detecting Web Vulnerabilities in an Intermediate Language Resorting of Machine Learning Techniques |author=Ana Fidalgo, |Project=Project...") |
|||
Line 2: | Line 2: | ||
|type=mastersthesis | |type=mastersthesis | ||
|title=Detecting Web Vulnerabilities in an Intermediate Language Resorting of Machine Learning Techniques | |title=Detecting Web Vulnerabilities in an Intermediate Language Resorting of Machine Learning Techniques | ||
- | |author=Ana Fidalgo, | + | |author=Ana Fidalgo, |
- | |Project=Project:SEAL, Project:Xivt, | + | |Project=Project:SEAL, Project:Xivt, |
|ResearchLine=Fault and Intrusion Tolerance in Open Distributed Systems (FIT) | |ResearchLine=Fault and Intrusion Tolerance in Open Distributed Systems (FIT) | ||
|month=nov | |month=nov | ||
Line 15: | Line 15: | ||
built four datasets (the Opcode Dataset, the Opcode+Operand Dataset, the Slice | built four datasets (the Opcode Dataset, the Opcode+Operand Dataset, the Slice | ||
Dataset, and the Simplified Slice Dataset) from the bytecode dataset that represent each PHP excerpt differently. This approach is a simpler alternative to complex data structures previously used to represent code’s control flow. For each of those datasets, we performed several experiments to evaluate alternative configurations for the model. For all datasets, we managed to find a setting that leads to a score, on average, above 60% for the accuracy, precision, and recall. | Dataset, and the Simplified Slice Dataset) from the bytecode dataset that represent each PHP excerpt differently. This approach is a simpler alternative to complex data structures previously used to represent code’s control flow. For each of those datasets, we performed several experiments to evaluate alternative configurations for the model. For all datasets, we managed to find a setting that leads to a score, on average, above 60% for the accuracy, precision, and recall. | ||
- | |advisor=Ibéria Medeiros, Nuno Ferreira Neves, | + | |school=Mestrado em Ciência em Dados |
+ | |advisor=Ibéria Medeiros, Nuno Ferreira Neves, | ||
}} | }} |
Latest revision as of 02:55, 23 December 2020
Ana Fidalgo (advised by Ibéria Medeiros, Nuno Ferreira Neves)
Master’s thesis, Mestrado em Ciência em Dados, Nov. 2020
Abstract: The number of vulnerabilities has grown exponentially over the last years, with SQL Injection being especially troublesome for web applications. In parallel, novel research has shown the potential of Machine Learning to find vulnerabilities, which can aid experts to reduce the search space or even classify programs on its own. Previous work, however, rarely includes SQL Injection or considers popular serverside languages for web application development like PHP. In our work, we construct a Deep Learning model capable of classifying PHP excerpts as vulnerable (or not) to SQL Injection. We use an intermediate language to represent the excerpts and interpret them as text, resorting to well-studied Natural Language Processing techniques. This work can help back-end programmers discover SQL Injection in an early stage of the project, avoiding attacks that would eventually cost a lot to repair their damage. We also investigate which information should be fed to the model. Hence, we built four datasets (the Opcode Dataset, the Opcode+Operand Dataset, the Slice Dataset, and the Simplified Slice Dataset) from the bytecode dataset that represent each PHP excerpt differently. This approach is a simpler alternative to complex data structures previously used to represent code’s control flow. For each of those datasets, we performed several experiments to evaluate alternative configurations for the model. For all datasets, we managed to find a setting that leads to a score, on average, above 60% for the accuracy, precision, and recall.
Export citation
Project(s): Project:SEAL, Project:Xivt
Research line(s): Fault and Intrusion Tolerance in Open Distributed Systems (FIT)