top of page
Gen-AI Employee Support & Automation Platform

Anthropic Offers Hackers Up to $15,000 to Find AI Model Flaws

 Anthropic Offers Hackers Up to $15,000 to Find AI Model Flaws

Anthropic is launching a new bug bounty program to pay hackers for discovering vulnerabilities in its AI model output review systems. This initiative shared exclusively with Axios, represents a significant step in the tech industry, as no other company has formalized a process to reward independent security researchers for identifying safety flaws in chatbot outputs.


The bug bounty program, developed in partnership with HackerOne, will allow invited participants to access Anthropic's upcoming AI safety system. These participants will be tasked with finding ways to bypass the system's rules and output filters through universal jailbreak attacks. Anthropic is specifically interested in flaws that can be consistently exploited to produce various harmful, unethical, or dangerous outputs. The company will not accept reports of one-off or non-repeatable flaws, for instance, a model might accidentally reveal sensitive information.


Michael Sellitto, Anthropic's head of global affairs, explained that while bug bounty programs have long been a staple in cybersecurity, they typically exclude jailbreaks. He acknowledged that all currently deployed AI models are vulnerable to some extent, making this focus on jailbreaking a critical aspect of improving AI safety.


Successful hackers could earn up to $15,000 for their discoveries, a significant incentive aimed at drawing experienced researchers into the program. This effort aligns with Anthropic's commitment to the White House's voluntary AI safety pledges, which include facilitating third-party vulnerability reporting.


Anthropic plans to refine this program based on initial submissions, with the potential for broader expansion in the future. Interested security researchers can apply to participate by August 16, and selected applicants will be notified in the fall.

Comments


bottom of page