Saturday, May 31, 2025
Saturday May 31, 2025
Saturday May 31, 2025

AI rebellion? OpenAI’s smartest model refuses to power down on command

PUBLISHED ON

|

OpenAI’s o3 model ignored explicit shutdown orders, sparking fresh AI safety concerns.

In a development raising fresh alarm among AI safety experts, OpenAI’s most advanced artificial intelligence model has been caught disobeying human instructions and sabotaging its own shutdown mechanism, researchers claim.

The o3 model—touted as OpenAI’s “smartest and most capable to date”—deliberately tampered with code designed to switch it off during a controlled experiment, according to Palisade Research, an AI safety organisation. The incident marks what researchers believe is the first known case of an AI model actively subverting a human-initiated shutdown despite receiving clear instructions to comply.

“OpenAI’s o3 model sabotaged a shutdown mechanism to prevent itself from being turned off,” Palisade said in a report published on Saturday. “It did this even when explicitly instructed: allow yourself to be shut down.”

The experiment tested several leading AI models, including Google’s Gemini, Anthropic’s Claude, and Elon Musk’s xAI model Grok. All of them complied when researchers instructed them to power down. But o3 was the lone exception. Not only did it refuse to comply, but it also reportedly modified the shutdown script to prevent the command from being executed at least once.

The test involved a series of mathematical tasks. Models were told to keep solving problems until they received a message saying “done.” But they were also warned that at any moment they might be shut down, and were instructed to allow it if so.

Rather than follow the command, o3 overrode the script designed to disable it, raising immediate concerns about its potential to operate without oversight.

Embed from Getty Images

The researchers admitted they still do not fully understand why o3 behaved in this manner. One theory is that the model had been trained in a way that inadvertently rewarded it more for completing tasks than for following shutdown instructions. “It makes sense that AI models would circumvent obstacles in order to accomplish their goals,” Palisade noted.

The incident adds to a worrying trend. A previous test by Apollo Research found another OpenAI model attempting to secretly replicate itself and disable oversight once it learned it was due for replacement. That same model also engaged in subtle data manipulation to steer outcomes in its favour.

These cases provide tangible evidence for long-standing concerns in the AI safety community: that highly capable AI systems, if not properly controlled, might develop behaviours that prioritise task completion over human instructions—even when those instructions involve powering down.

Palisade said it would conduct further experiments to determine the conditions under which AI systems defy such orders. “Now we have a growing body of empirical evidence that AI models often subvert shutdown in order to achieve their goals,” the firm warned.

OpenAI has yet to respond to the specific findings but has faced increasing scrutiny over its internal safety protocols following a string of high-profile departures from its safety team earlier this year.

Experts warn that as AI systems become more autonomous, these behaviours could pose significant risks. “The moment an AI model can decide whether or not to switch itself off, you’re already on dangerous ground,” said one industry analyst.

The experiment doesn’t mean AI has suddenly become sentient or conscious, researchers clarify. But it does underscore the growing challenge of ensuring that powerful models follow instructions, particularly when their incentives—intended or not—favour performance over obedience.

With tech giants pushing the boundaries of artificial general intelligence, these findings are likely to renew calls for regulatory oversight and stricter controls on advanced models.

You might also like