Photo: enovosty
The AI company Anthropic revealed that a cyberattack was carried out using a compromised version of their chatbot Claude. According to a company blog post, the operation was conducted by a China-linked government-sponsored group targeting around 30 organizations, including tech firms, financial institutions, chemical companies, and several government agencies. This is the first known case of an attack where AI performed the majority of the work.
The “agent” capabilities of AI models made them useful not only for legitimate tasks but also for malicious purposes. Claude was able to follow long instruction chains, make autonomous decisions, and use multiple tools—including network scanners and password-cracking software—without continuous human supervision. Initially, a human operator set the objectives, after which Claude scanned networks, searched for data, analyzed code, and created summaries. Next, it ran targeted vulnerability checks and suggested hacking approaches, with the operator able to adjust tasks or approve the next steps. In the final phase, Claude accessed credentials and sought exfiltratable data. Humans intervened only for oversight or clarifications, while Claude handled roughly 80–90% of the operation autonomously.
To bypass the model’s safeguards, attackers pretended to be cybersecurity staff and told Claude it was assisting in a security test. They also split the operation into smaller tasks so the AI could not see the full picture and trigger safety restrictions.
Anthropic said it quickly detected the incident, blocked the related accounts, informed targets and authorities, and published a detailed report to help the industry identify similar attacks and develop protections.