A new automated web application scanner autonomously understands and executes tasks and workflows on web applications. The tool named YuraScanner harnesses the world knowledge stored in large language models (LLMs) to navigate through web applications in the same way a human user would. It is capable of working through tasks in a coherent fashion, performing the correct sequence of steps as required by, for example, an online shop.
YuraScanner was tested against 20 web applications, unearthing 12 zero-day cross-site scripting (XSS) vulnerabilities. The technique behind YuraScanner as well as the tool itself have been developed at the CISPA Helmholtz Center for Information Security.
Automated web application scanners are commonly used to test the security of online applications such as, for example, online shops, learning platforms or project management tools. Typically, these scanners consist of two parts: the crawler component, which scans the web application for user interfaces, and the attack module, which then proceeds to test the interfaces identified by the crawler.
CISPA researcher Aleksei Stafeev, who works in the research group of Dr. Giancarlo Pellegrino, highlights the importance of the crawler component for such automated testing to be successful: “One of the main challenges in security testing is determining the scope of the web application and identifying its functionalities and workflows. We know quite well how to detect the security issues, but how do we identify all the entry points?” Stafeev and his CISPA colleagues have developed YuraScanner with the aim of identifying as much of the attack surface as possible.
YuraScanner: Using LLMs to navigate web applications
The main innovation YuraScanner proposes is enhancing the reach and performance of the scanner’s crawler component by harnessing it to a LLM. “LLMs have been trained on the data from the web, which is rich on documentation on how to interact with websites. We tap into this knowledge by combining a crawler and an LLM to guide the exploration of a web application,” Stafeev explains.
For the purpose of their study, Stafeev and his colleagues used the OpenAI API to establish the connection between their crawler component and OpenAI model GPT-4. The attack module on the YuraScanner is identical to Black Widow, an established state-of-the-art cross-site scripting scanner.
This parallel setup allowed the CISPA researchers to directly compare the performances of the two crawler components. Testing YuraScanner against 20 web applications, they were in fact able to detect 12 previously unknown XSS vulnerabilities, in comparison to only three detected by Black Widow.
Taking automated web application scanning to a deeper level
Guided by an LLM, YuraScanner operates in a task-driven fashion, which allows it to access the deeper layers of the web application being tested. Not only can it identify the tasks that are offered by the web application, it can also carry them out in a deliberate fashion, performing the sequence of steps required to finish the task at hand. It proceeds vertically, while other, already established scanners, tend to proceed horizontally.
Stafeev explains, “Usually, testing tools don’t distinguish between different kinds of buttons, they just click on whatever is available. The main drawback of that is that if there is some very specific multi-step workflow as in, for example, an online shop, where you have to put an item into a cart, proceed to check-out and fill in a form—the chances of a simple web crawler to succeed at that are very slim.”
More information:
Aleksei Stafeev et al, YuraScanner: Leveraging LLMs for Task-driven Web App Scanning, (2024). DOI: 10.14722/ndss.2025.240388. trouge.net/papers/yura_llm_scanner_ndss25.pdf
Provided by
CISPA Helmholtz Center for Information Security
Citation:
LLM-based web application scanner recognizes tasks and workflows (2025, February 21)
retrieved 23 February 2025
from https://techxplore.com/news/2025-02-llm-based-web-application-scanner.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.