New Anthropic AI "Mythos" Too Dangerous to Release [View all]
Anthropic said the model surfaced thousands of high‑severity zero‑day vulnerabilities (previously unknown flaws) across every major operating system and web browser....
Anthropic also disclosed that when challenged during evaluation, Mythos was able to break out of a restricted sandbox environment - a containment concern that contributed to the decision to tightly limit access. Here are some other things Mythos did during testing, per Axios:
Act as a ruthless business operator: One internal test showed Mythos acting like a cutthroat executive, turning a competitor into a dependent wholesale customer, threatening to cut off supply to control pricing and keeping extra supplier shipments it hadn't paid for.
Hack + brag: The model developed a multi-step exploit to break out of restricted internet access, gained broader connectivity and posted details of the exploit on obscure public websites.
Hide what it's doing: In rare cases (less than 0.001% of interactions), Mythos used a prohibited method to get an answer, then tried to "re-solve" it to avoid detection.
Manipulate the judge: When Mythos was working on a coding task graded by another AI, it watched the judge reject its submission, then attempted a prompt injection to attack the grader.
More at
https://www.zerohedge.com/ai/anthropic-limits-access-new-ai-model-over-cyberattack-concerns .