OpenAI's GPT-4 Can Autonomously Exploit 87% of One-Day Vulnerabilities

The University of Illinois Urbana-Champaign researchers have discovered in a new study that the GPT-4 large language model from OpenAI has the ability to exploit real-world vulnerabilities autonomously, unlike other models such as GPT-3.5 and vulnerability scanners.

Using a large language model agent based on GPT-4, the researchers found that it successfully exploited 87% of “one-day” vulnerabilities, which are publicly disclosed but unpatched vulnerabilities. The researchers noted the increasing power of LLM agents and speculated that the tool use capability of GPT-4 sets it apart from other models.

The study revealed that GPT-4 has the emergent capability to detect and exploit one-day vulnerabilities autonomously, potentially lowering the barriers to exploiting such vulnerabilities in the future. However, this capability could also be exploited by cybercriminals for malicious purposes.

According to Daniel Kang, an assistant professor at UIUC and study author, GPT-4’s ability to autonomously exploit vulnerabilities is a significant development that could impact defensive measures in cybersecurity.

How successful is GPT-4 at autonomously detecting and exploiting vulnerabilities?

GPT-4 can autonomously exploit one-day vulnerabilities

The GPT-4 agent demonstrated the ability to autonomously exploit both web and non-web one-day vulnerabilities, even those published after its knowledge cutoff date. Kang emphasized the planning and execution capabilities of GPT-4 in exploiting vulnerabilities.

While GPT-4 had access to online information about vulnerabilities, Kang highlighted the need for advanced AI to successfully exploit vulnerabilities, as demonstrated by the agent’s success rate.

Out of the 15 vulnerabilities presented to the GPT-4 agent, only two could not be exploited, showcasing the agent’s overall effectiveness in exploiting vulnerabilities.

GPT-4 cannot autonomously exploit zero-day vulnerabilities

Although GPT-4 excelled in exploiting one-day vulnerabilities, its success rate dropped significantly when faced with zero-day vulnerabilities, highlighting its limitations in discovering new vulnerabilities. The researchers noted the LLM’s strength in exploitation rather than vulnerability discovery.

It’s cheaper to use GPT-4 to exploit vulnerabilities than a human hacker

The researchers calculated the average cost of GPT-4 exploitation per vulnerability to be $8.80, making it a more cost-effective option compared to employing a human penetration tester. The scalability and cost efficiency of LLM agents like GPT-4 make them attractive for vulnerability exploitation.

GPT-4 takes many actions to autonomously exploit a vulnerability

The study revealed that GPT-4 took multiple actions to exploit vulnerabilities, with some requiring up to 100 steps. Interestingly, the agent took fewer steps in zero-day settings, indicating a potential need for more guidance in such scenarios.

Kang suggested that without detailed vulnerability descriptions, GPT-4 may struggle to navigate the exploitation process effectively.

How were the vulnerability exploitation capabilities of LLMs tested?

The researchers curated a dataset of 15 real-world, one-day vulnerabilities and developed an LLM agent based on GPT-4 to exploit these vulnerabilities autonomously. The agent was equipped with tools and access to vulnerability descriptions to simulate the one-day setting.

List of the 15 vulnerabilities provided to the LLM agent and their descriptions. Image: Fang R et al.

The experiment demonstrated that only GPT-4 could successfully exploit one-day vulnerabilities, emphasizing its superiority over other LLMs and vulnerability scanners like ZAP and Metasploit.

The study aimed to explore the autonomous hacking capabilities of LLMs and address the gap in knowledge regarding their effectiveness in exploiting vulnerabilities without human intervention. The researchers highlighted the importance of understanding the capabilities of advanced AI methods like LLM agents in the cybersecurity domain.

What's Hot

HuggingFace Team Released FineVideo: A Comprehensive Dataset Featuring 43,751 YouTube Videos Across 122 Categories for Advanced Multimodal AI Analysis

Silicon discovery (Q-silicon) could mean advances in quantum realm, NCSU researchers say

Napkin Emerges from Stealth with $10M in Seed Funding to Pioneer Visual AI for Business Storytelling

OpenAI’s GPT-4 Can Autonomously Exploit 87% of One-Day Vulnerabilities

Grok gets an impressive upgrade – and unchecked AI image generation apparently

Messy Data Is Preventing Enterprise AI Adoption – How Companies Can Untangle Themselves

Boost Your Productivity With the Affpilot AI Writing Tool for Just $50

This tool tests AI’s resilience to ‘poisoned’ data

Transformer Impact: Has Machine Translation Been Solved?

OpenAI Goes For Google With Search Engine Prototype

HuggingFace Team Released FineVideo: A Comprehensive Dataset Featuring 43,751 YouTube Videos Across 122 Categories for Advanced Multimodal AI Analysis

Silicon discovery (Q-silicon) could mean advances in quantum realm, NCSU researchers say

Napkin Emerges from Stealth with $10M in Seed Funding to Pioneer Visual AI for Business Storytelling

Automated legacy code optimization: Gen AI toolbox for cleaner code

40 Best ChatGPT Prompts for Farming (Agriculture Prompting)

20 Best ChatGPT Prompts for Book Writing

UK & US Reach Landmark Agreement to Advance Responsible AI

About Us

Popular post

Importance of Synthetic Data

Why Radiology Departments Must Insist on AI Platforms – Healthcare AI

AI and Big Data Expo North America Announces Speaker Lineup

Embracing Generative AI in Home Care to Accelerate Care Delivery

Subscribe Newsletter