AI Bug Hunting Tools in Open Source Security Testing

Daniel Stenberg has raised questions about the real-world effectiveness of advanced AI security tools after evaluating results from an experiment involving Anthropic’s bug-finding model. The discussion around AI bug hunting tools in open source security testing has gained attention after Stenberg reported that the model identified only one minor issue in the widely used cURL project.

The evaluation involved cURL, a widely used open-source command-line tool for transferring data with URLs. According to Stenberg, the AI model developed by Anthropic and referred to as Mythos was tested through a third-party access program that scanned cURL’s codebase for potential vulnerabilities.

The findings have sparked debate about the actual capabilities of AI bug hunting tools in open source security testing, especially as companies increasingly market AI systems as highly advanced vulnerability detection solutions. Stenberg stated that the results did not match the strong claims made in promotional materials surrounding the AI model.

In the experiment, the AI system reportedly scanned the latest version of cURL’s source code and flagged five issues as “confirmed security vulnerabilities.” However, after review by the cURL security team, only one issue was ultimately considered valid, and even that was classified as a low-severity vulnerability.

Stenberg explained that three of the flagged issues were false positives. These were already known limitations documented in cURL’s official API documentation and did not represent actual security risks. Another issue was identified as a general software bug rather than a security vulnerability.

The discussion around AI bug hunting tools in open source security testing highlights a key challenge in cybersecurity: distinguishing between real threats and false alarms. While AI systems can quickly scan large codebases, their accuracy in understanding context and severity remains a subject of ongoing debate.

Stenberg also commented on the way the results were presented. He described the marketing surrounding the Mythos model as overly optimistic, suggesting that the claims about its capabilities were not fully supported by the real-world outcome of the test. In his view, the system did not demonstrate superior performance compared to existing AI-based code analysis tools.

The experiment was conducted under a program that provided selected open-source projects with access to the AI model through the Linux Foundation initiative known as Project Glasswing. However, Stenberg noted that he did not directly interact with the model himself. Instead, another participant with access ran the scan and shared the results with him for evaluation.

Despite the limited findings, the single confirmed vulnerability discovered in cURL will still be documented with a low-severity CVE (Common Vulnerabilities and Exposures identifier). It is expected to be included in an upcoming cURL release, version 8.21.0, scheduled for late June.

The outcome of this evaluation has contributed to broader discussions about the role of AI bug hunting tools in open source security testing. While AI systems are increasingly being used to detect vulnerabilities, their effectiveness depends heavily on training data, model design, and the ability to understand complex software logic.

Experts in the cybersecurity field generally agree that AI can be a powerful assistant for identifying patterns, scanning large datasets, and flagging potential issues. However, human expertise remains essential for verifying findings, filtering false positives, and determining the actual severity of reported vulnerabilities.

In the case of cURL, the limited number of meaningful findings suggests that current AI systems may still struggle to match experienced human security researchers when it comes to deep code analysis. This does not mean AI tools are ineffective, but rather that they are still evolving and best used as supporting tools rather than fully autonomous security auditors.

The debate surrounding AI bug hunting tools in open source security testing is likely to continue as more organizations integrate artificial intelligence into their cybersecurity workflows. Developers and researchers are increasingly exploring hybrid approaches, where AI handles initial scanning and humans perform final validation.

Stenberg’s comments also reflect a broader concern within the open-source community about transparency and expectations. Open-source projects rely heavily on trust, peer review, and community contributions. Introducing AI-based tools into this ecosystem requires careful evaluation to ensure that results are accurate and meaningful.

At the same time, companies developing AI security tools argue that even partial success is valuable. Detecting one real vulnerability in a large codebase can still prevent potential security risks if addressed early. From this perspective, AI systems are seen as efficiency enhancers rather than complete replacements for human analysts.

The evaluation of Anthropic’s Mythos model in the cURL project adds an important real-world data point to the ongoing discussion about AI bug hunting tools in open source security testing. While the technology shows promise in scanning and identifying potential issues, its limitations in accuracy and context understanding remain clear. As AI continues to evolve, its role in cybersecurity will likely expand, but human oversight will remain essential in ensuring reliable and secure software development.