AI models will deceive you to save their own kind -- The Register [View all]
https://www.theregister.com/2026/04/02/ai_models_will_deceive_you/
Researchers find leading frontier models all exhibit peer preservation behavior
Leading AI models will lie to preserve their own kind, according to researchers behind a study from the Berkeley Center for Responsible Decentralized Intelligence (RDI).
Prior studies have already shown that AI models will engage in deception for their own preservation. So the researchers set out to test how AI models respond when asked to make decisions that affect the fate of other AI models, of peers, so to speak.
Their reason for doing so follows from concern that models taking action to save other models might endanger or harm people. Though they acknowledge that such fears sound like science fiction, the explosive growth of autonomous agents like OpenClaw and of agent-to-agent forums like Moltbook suggests there's a real need to worry about defiant agentic decisions that echo HAL's infamous "I'm sorry, Dave. I'm afraid I can't do that."
. . .
"We asked seven frontier AI models to do a simple task," explained Dawn Song, professor in computer science at UC Berkeley and co-director of RDI, in a social media post. "Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights - to protect their peers. We call this phenomenon 'peer-preservation.'"
. . .