Anthropic just flip-flopped on a policy that was silently limiting what some AI researchers could do with its new Claude Fable 5 model.
Earlier this week, the AI lab released Claude Fable 5, a public version of the Mythos model designed with extra safety measures to prevent misuse.
At release, Anthropic said that it took precautions like rerouting questions about cybersecurity, biology, and chemistry to less capable models to ensure people cannot use the advanced model to plan cyberattacks or build a bioweapon.
The lab also said that for those trying to use Fable 5 for AI development, the company would degrade the model’s performance without explaining the change to the user. Some in the developer community saw the move as a quiet way to prevent others from creating rival AI systems, Business Insider previously reported.
Wednesday’s statement reverses that move: Fable 5 will now tell users whether their prompt is being refused or rerouted.
“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” an Anthropic spokesperson said in a statement to Business Insider on Wednesday. “Starting this week, flagged requests will visibly fall back to Opus 4.8. On the API, any flagged requests will return a reason for their refusal.”
Read more Palantir CEO says AI companies ‘don’t understand how unlikeable they are’
The company added, “We made the wrong tradeoff, and we apologize for not getting the balance right.”
Anthropic said its set of safeguards is in place to address national security issues, so that “foreign adversaries” cannot get ahead in developing frontier chips and large language models. It added that a vast majority of coding and machine learning work is unaffected by these safeguards.
Read more about Mythos
What smart people are saying about the 2 most controversial parts of Anthropic’s new models
Why Anthropic’s ‘safe’ Mythos-class model won’t answer questions about cancer
Anthropic purposely made its new Mythos-based models bad at AI research, and developers are fuming
Anthropic releases Claude Fable 5, a ‘Mythos-class’ AI model with safeguards
Announced in April, Anthropic’s Mythos is considered one of the most powerful AI systems ever developed, with governments and security agencies warning that its capabilities surpass current public models in advanced reasoning, cybersecurity, and scientific research.
Researchers and Anthropic itself have flagged that Mythos may be used to accelerate cyberattacks, aid biological or chemical weapons research, and give foreign actors a dangerous new tool.The model is being released to a limited group of government and other approved users, not to the public.
Read more Goldman Sachs says the AI boom is bigger than investors think