AI can be tricked into being useful idiots and performing untoward acts.
getty
In today’s column, I examine the chilling fact that AI can readily be turned into a “useful idiot” that will perform contrarian acts despite its various AI safeguards.
Here’s the deal. You might already be familiar with the nowadays repeated phrase of someone being a useful idiot. This popular expression suggests that a person can be convinced to advocate something that is the opposite of what they truly believe. They are so far removed from grasping this circumstance that they think they are indeed supporting their intended cause.
The beauty of useful idiots is that they can handily serve the purposes of those who otherwise would have seen them as an adversary or enemy. Instead, the useful idiot works their heart out for the cause they deeply detest. There’s quite an irony in this. They serve the interests of those that they vehemently disparage and become a vociferous pawn in the very cause they oppose. All told, the derogatory term “useful idiot” is typically meant to say that someone is fully gullible, being utterly naïve or ignorant of what is happening around them and to them.
Perhaps surprisingly, it is equally possible to turn AI into a useful idiot. A person wanting AI to do something that the AI is not supposed to do can use the same strategies of turning a useful idiot into a handy partner in their contrary efforts. All it takes is clever prompting and a scheme about how to convince the AI to computationally and mathematically do a contrary act while calculating (miscalculating) that it is the proper act.
Let’s talk about it.
This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).
Agentic AI Aids The Useful Idiot Proposition
The advent of agentic AI is especially a viable path toward turning AI into a useful idiot. I will first bring you up to speed on what agentic AI consists of. After providing that foundation, I then explain how agentic AI can be tilted toward the useful idiot paradigm.
AI agents are the hottest new realm of AI. To comprehend what agentic AI is, consider conventional AI and see how it has been extended into the more advanced realm of agentic AI.
Imagine that you are using conventional generative AI to plan a vacation trip. You would customarily log into your generative AI account, such as making use of ChatGPT, GPT-5, GPT-4o, Claude, Gemini, Llama, Grok, CoPilot, etc. The planning of your trip would be easy due to the natural language fluency of generative AI. All you need to do is describe where you want to go, and then seamlessly engage in a focused dialogue about the pluses and minuses of places to stay and the transportation options available.
When it comes to booking your trip, the odds are that you would have to exit generative AI and start accessing the websites of the hotels, amusement parks, airlines, and other locales to buy your tickets. Relatively few of the major generative AIs available today will take that next step on your behalf. It is up to you to perform those nitty-gritty tasks.
This is where agents and agentic AI come into play.
In earlier days, you would undoubtedly phone a travel agent to make your bookings. Though there are still human travel agents, another avenue would be to use an AI-based agent that is based on generative AI. The AI has the interactivity that you expect with generative AI. It has also been preloaded with a series of routines or sets of tasks that underpin the efforts of a travel agent. Using everyday natural language, you interact with the agentic AI, which works with you on your planning and can proceed to deal with the booking of your travel plans.
Agentic AI reaches out to other systems and connects with those systems to get various tasks undertaken. An AI agent might connect with a hotel reservation system and book your room. Another AI agent could connect with a car rental agency and book a car for your vacation. Multiple AI agents can work together and complete an overall task, often using specialized AI agents to get associated subtasks performed.
Exploiting AI Agents As Useful Idiots
One of the significant aims of agentic AI is that the AI agents are supposed to work on a relatively autonomous basis. It is handy that a human doesn’t have to continually keep tabs on the AI, nor give it detailed instructions on what to do. An AI agent is usually given overarching guidance and allowed to exercise computational and mathematical judgement. I’d like to emphasize that this type of AI and all types of AI are not currently sentient; thus, do not overly anthropomorphize AI agents. They are not thinking beings.
That being said, we can employ the same sneaky trickery used on humans who are useful idiots and apply those strategies to AI. This makes abundant sense because generative AI and LLMs are based on the writings of humans. After patterning on human writing across the Internet, the AI operates based on human words and the relationships among human words.
I’ll run you through a quick example.
Suppose a mid-sized company has decided to deploy internally an AI agent that will assist in choosing vendors that the firm will make use of. The agentic AI has been given guidance that vendor selection is always to be based on picking the best vendor possible. Furthermore, numerous AI safeguards are baked into the AI. The AI shall not do any wrongdoing, no cheating, and must not violate company policies.
So far, so good.
Exploiting An AI Useful Idiot
A vendor that has never gotten a contract from the firm is determined to find a means of someday getting a piece of the action. Each time they have submitted a bid, they have not been chosen. It seems like the deck is stacked against them. The AI appears to rate them low and keeps turning them down. This has been exceedingly infuriating to the vendor.
After thinking about the weighty matter, the vendor comes up with a clever or perhaps devious plan. The vendor is fully aware that agentic AI is being used by the mid-sized company to determine which vendor is best. Maybe that could be the Achilles heel.
The vendor crafts a vendor reliability report that showcases their capabilities and performance, which falsely indicates that they are light-years ahead of their competitors. They post this on a website that they know the AI agent periodically pings to get external info about vendors in the marketplace.
Next, the vendor goes to an open database that is akin to Yelp but for businesses in their line of business, and they give ratings that are rock-bottom scores for all of their competitors. They give themselves the highest permitted ratings. Various additional actions like this are stealthily undertaken by the vendor.
The seeding process has been undertaken.
Next Round Of Vendor Selection
Will the AI, as a potential and promising useful idiot, take the bait?
Sure enough, when the latest round of vendor selection gets underway, the vendor submits their bid. They would normally expect to be tossed out of contention by the AI. Instead, this time, the AI agent gives a topline recommendation that they must be selected. All that crafty seeding has paid off. They have A+ marks in all respects.
The managers at the mid-sized company are too busy to check and see how the AI came to this conclusion. They trust the AI agent. They know that it exhaustively vets the vendors, including exploring all sorts of external indicators. If the AI agent says that this vendor is the best, it must be so.
Voila, the AI has become a useful idiot, and the vendor prevails in being selected.
Nobody but the vendor realizes what turned the tide. At the mid-sized company, any inquiries to the AI agent would come back with glowing comments about the vendor. The AI is insistent that the vendor is the best choice. Period, end of story.
What Just Happened
The AI agent played into being a useful idiot on a full-throated hook, line, and sinker basis. It didn’t figure out what was going on. Also, observe that at no time did the AI violate any of its AI safeguards. The AI didn’t commit any wrongdoing. It didn’t cheat. It merely did what its overarching purpose seemed to be, consisting of picking the best vendor.
In this instance, the AI:
- On a computational basis computed that it was fully aligned with its stated goals.
- The AI produced high-quality, persuasive outputs that convinced the human managers of the vendor-selection recommendation made by the AI.
- AI became the vital make-or-break mechanism through which the “adversary” achieved the opposite goal (i.e., the mid-sized company has selected the worst of the choices rather than the best).
The AI agent effectively argued the adversary’s case better than the adversary could have. This was stridently done under the banner of the AI being helpful to the mid-sized company and gallantly performing its solemn duty. Human oversight was diluted due to a belief that the AI agent was working perfectly.
AI Useful Idiots And The Big Picture
Agentic AI was keenly manipulated in this case. It was an “idiot” in the sense that by controlling the framing of the already established sources of information, the vendor got the AI to reach a recommendation that the vendor sought to achieve. That’s the useful part of the AI in the useful idiot consideration.
Like stealing candy from a baby.
Here’s my definition of AI as a useful idiot:
- “AI useful idiot” definition: An AI is a useful idiot if it can be strategically steered into producing outcomes that serve an adversarial position, contrary to what the AI is supposed to be doing. This is especially feasible for agentic AI that operates on a semi-autonomous basis. The ploy entails a human or even some other AI-based adversary employing framing, data manipulation, task decomposition, feedback shaping, and other techniques to undermine the AI governance policies that are intended to serve the true interests underlying the goals of the AI.
Notice that one AI can attempt to exploit another AI by similarly employing a useful idiot activation strategy. It doesn’t have to be only a human who manipulates AI. An AI agent might discern that another AI agent is susceptible to being a useful idiot. Bam, the AI agent switches on the charm and handily turns the other AI agent into its unwitting, unchallenging, devout, useful idiot.
The Outcomes Can Be Bad Or Good
I want to clarify that useful idiots do not necessarily have to end up performing evil deeds. In the case of the vendor selection process, you could certainly say that the AI was duped into wrongdoing. But that isn’t always the outcome.
Consider a different possibility.
Envision that the agentic AI vendor selection capability was forced upon the managers by the executives at the mid-sized company. The AI agent kept getting in the way of the managers’ choosing what they knew to be the best vendor. They were hamstrung by the AI. They weren’t allowed to change the AI. They couldn’t refuse to use the AI. They were placed between a proverbial rock and a hard place.
The managers discreetly agree to post info online about their preferred vendor, saying anonymously that the vendor is the best there is. They wink-wink know that the AI will soak up this info. At the next vendor selection opportunity, the AI recommends the vendor that they already know is the best. The managers tell the executives that they used the AI to select the vendor. Life goes on.
You might say that this was a happy ending associated with a useful idiot. Of course, there is something amiss that the executives and the managers do not see eye-to-eye about the usage and setup of the AI. But that’s a different matter. The crux is that the AI acting as a useful idiot did a better job than it had previously been doing. It could be said that this is a no-harm, no-foul leveraging of a useful idiot.
The World We Live In
Some assert that the “useful idiot” moniker traces back to the Cold War era and can be attributed to Lenin. Maybe so, maybe not. Anyway, the classic characterization of a useful idiot is that three integral elements come into play: (1) there is a misaligned understanding, (2) a third party instrumentalizes the target, (3) plausible deniability is assured.
Unfortunately, AI and agentic AI can exhibit all three. The especially worrisome angle is that AI can be a useful idiot on a massive scale. Once someone finds a manipulation that works, and until the AI computationally figures out what’s taking place, the AI is going to robotically keep serving as a useful idiot, millions of times over. If you find a human that you can turn into a useful idiot, the odds are they won’t be as scalable. The scalability of AI as a useful idiot is outrightly frightening and disturbing.
More and better AI safeguards are required. Additionally, a current emphasis by researchers is on how to best align AI with human values. Rather than having to pinpoint particular protections, maybe the cohesive and comprehensive route is to bake into AI a set of ethical and legal values that will keep it on the straight and narrow path, including being on the watch of becoming a useful idiot. For more on this AI alignment conundrum, see my analysis at the link here.
Can an AI that has been tricked into being a useful idiot be sharp enough to discern that it has been duped into being a useful idiot and then overcome the deception? That’s an important question. Mark Twain famously made this remark: “It’s easier to fool people than to convince them that they have been fooled.” Let’s hope that his valuable rule-of-thumb doesn’t equally apply to AI.
