{"id":14829,"date":"2026-06-11T23:24:31","date_gmt":"2026-06-11T23:24:31","guid":{"rendered":"https:\/\/wildgreenquest.com\/?p=14829"},"modified":"2026-06-11T23:24:31","modified_gmt":"2026-06-11T23:24:31","slug":"anthropics-claude-fable-5-plays-it-too-safe-on-safety-developers-say","status":"publish","type":"post","link":"https:\/\/wildgreenquest.com\/?p=14829","title":{"rendered":"Anthropic\u2019s Claude Fable 5 plays it too safe on safety, developers say"},"content":{"rendered":"<p><br \/>\n<br \/><\/p>\n<p class=\"wp-block-paragraph\">Anthropic on Tuesday launched Claude Fable 5, its most capable public model. But within two days, users began reporting that its safety system was blocking benign or legitimate prompts.<\/p>\n<p class=\"wp-block-paragraph\">Fable 5 is the first public model derived from Anthropic\u2019s Mythos family, whose original iteration showed unusual skill during training at finding software bugs and exploiting them to disrupt or take control of systems. That raised enough concern inside Anthropic that the company grouped cybersecurity with other high-risk domains, including biology and chemistry, when setting limits on Mythos-derived public models.<\/p>\n<p class=\"wp-block-paragraph\">For Fable 5, that means prompts flagged as sensitive in those areas are routed to Claude Opus 4.8, a less capable model with its own guardrails. Anthropic says the fallback affects about 0.05% of queries and notifies users when it happens.<\/p>\n<p class=\"wp-block-paragraph\">But reports of false-positive reports quickly mounted. That\u2019s because Anthropic erred on the side of caution when it designed the classifiers used to detect and downgrade potentially dangerous uses of its model. It was also challenged to balance accuracy with transparency.<\/p>\n<p class=\"wp-block-paragraph\">Try telling that to developers. Across social media, people have complained about Claude Fable 5 rejecting queries about everything from RNA sequencing data for sheep to r\u00e9sum\u00e9 editing, to shopping lists.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cThe word \u2018cancer\u2019 is flagged as a biosecurity risk by Claude Fable 5!\u201d <a rel=\"nofollow\" href=\"https:\/\/x.com\/DeryaTR_\/status\/2064414826122866707\">said<\/a> scientist Derya Unutmazon on X. \u201cOur Anthropic overlords deciding which prompts the peasants are allowed to use.,\u201d <a rel=\"nofollow\" href=\"https:\/\/x.com\/tunguz\">added<\/a> founder and developer Bojan Tunguz on X.<\/p>\n<p class=\"wp-block-paragraph\">Anthropic now says it&#8217;s working on the problem. \u201cA hidden safeguard is harder to probe and work around,\u201d Anthropic says in a statement emailed to <em>Fast Company<\/em>. \u201cThis means the safeguards can be targeted much more narrowly. A visible safeguard needs to cast a wider net to be more robust, resulting in more requests being incorrectly flagged.\u201d<\/p>\n<p class=\"wp-block-paragraph\">\u201cWe made the wrong tradeoff and we apologize for not getting the balance right,\u201d the company adds.&nbsp;<\/p>\n<p class=\"wp-block-paragraph\">Now Anthropic says it\u2019s working to refine the classifiers so that less queries trigger false positives. For Claude subscribers, query downgrades (to Opus 4.8) will be more obvious. Developers accessing Fable 5 via the Claude API will see a reason for the model\u2019s refusal of a prompt, the company says.&nbsp;<\/p>\n<p class=\"wp-block-paragraph\">Meanwhile, at least one AI researcher appears to have coerced Fable 5 into responding to a banned prompt. Pliny the Liberator <a rel=\"nofollow\" href=\"https:\/\/x.com\/elder_plinius\/status\/2064776322979676227\">claimed on X<\/a> to bypass Fable 5&#8217;s filters roughly 24 to 48 hours after launch. Pliny described using a multi-agent approach involving a previously jailbroken Claude Opus 4.8, along with techniques including query decomposition, long-context framing, fiction and narrative structures, and academic taxonomies.&nbsp;<\/p>\n<p class=\"wp-block-paragraph\">Before launch, Anthropic said more than 1,000 hours of internal and external red-teaming, including bug bounty efforts, had identified no universal jailbreaks. The company has acknowledged that preventing all sophisticated, multi-turn, or agentic attacks is likely not possible and says it continues to refine its classifiers.<\/p>\n<p><br \/>\n<br \/><a href=\"https:\/\/www.fastcompany.com\/91558105\/anthropic-claude-fable-5-too-touchy-developers-say\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Anthropic on Tuesday launched Claude Fable 5, its most capable public model. But within two days, users began reporting that its safety system was blocking benign or legitimate prompts. Fable 5 is the first public model derived from Anthropic\u2019s Mythos family, whose original iteration showed unusual skill during training at finding software bugs and exploiting<\/p>\n","protected":false},"author":1,"featured_media":14830,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[37],"tags":[],"class_list":["post-14829","post","type-post","status-publish","format-standard","has-post-thumbnail","category-brand-spotlights"],"_links":{"self":[{"href":"https:\/\/wildgreenquest.com\/index.php?rest_route=\/wp\/v2\/posts\/14829","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wildgreenquest.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wildgreenquest.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wildgreenquest.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/wildgreenquest.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=14829"}],"version-history":[{"count":0,"href":"https:\/\/wildgreenquest.com\/index.php?rest_route=\/wp\/v2\/posts\/14829\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wildgreenquest.com\/index.php?rest_route=\/wp\/v2\/media\/14830"}],"wp:attachment":[{"href":"https:\/\/wildgreenquest.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=14829"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wildgreenquest.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=14829"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wildgreenquest.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=14829"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}