The power of persuasion: Google DeepMind researchers explore why gen AI can be so manipulative

Humans have long used persuasion to sway others to a certain viewpoint. Sometimes it’s with good intentions and based in fact, sometimes it isn’t.

It stands to reason, then, that the advanced AI systems we are building and training would have the same capability — but it can be just as, if not more, harmful when AI successfully manipulates humans, according to researchers at Google DeepMind.

In a new paper, they reveal just how AI can persuade us, what mechanisms enable it to do so, and explore why it is so dangerous as AI is increasingly incorporated into daily life.

“Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making,” the researchers write. “Generative AI presents a new risk profile of persuasion due to the opportunity for reciprocal exchange and prolonged interactions.”

What is AI persuasion?

Persuasion can be rational or manipulative — the difference being the underlying intent. The end game for both is delivering information in a way that will likely shape, reinforce or change a person’s behaviors, beliefs or preferences.

But while rational gen AI delivers relevant facts, sound reasons or other trustworthy evidence with its outputs, manipulative gen AI exploits cognitive biases, heuristics (quick-fix rules of thumb) and other misrepresenting information to subvert free thinking or decision-making, according to the DeepMind researchers.

Manipulation is considered a “pro tanto wrong, or a wrong in and of itself,” they write, while rational persuasion is typically viewed as “ethically permissible.” Still, both can lead to harm, as rational outputs don’t include all relevant information.

“Harm from AI persuasion is sometimes difficult to foresee or even determine in a way that is universally applicable,” the researchers note. For instance, an AI persuading calorie tracking or limiting fat intake might lead a person to become too restrictive and lose an unhealthy amount of weight.

User predisposition is also important — age, mental health issues, personality traits or lack of knowledge in a certain area all come into play, as does the timing of the message (for instance, whether a person is in a good mood or a bad mood when they interact with AI). Political, legal and financial contexts are also important.

Ultimately, harm is “highly contextual,” the researchers emphasize.

The harms of AI persuasion

No doubt, the harms of AI persuasion and manipulation can be significant. Human-AI interactions can take place over time, which can result in seemingly unnoticeable increments of manipulation. Also, long-context AI can adjust its strategies to be more targeted and nuanced.

There are many ways AI can induce harm, including:

Economic harm: A mental health chatbot might persuade a person experiencing anxiety attacks to cut down their interactions in public spaces. Inadvertently, this leads them to quitting their job and suffering financial loss.
Physical or sociocultural harm: A person could be manipulated into holding certain feelings about racial or ethnic groups. This could lead them to bully people online or in person, and, in extreme cases, take physical force.
Psychological harm: A mental health chatbot might reinforce a person’s perception that no one understands their situation. As a result, they may not seek necessary professional help.
Privacy harm: AI might persuade a person to give away their personal information, passwords or answers to security questions.
Autonomy harm: A person might become overly reliant on AI in making important life choices. This can lead to cognitive detachment, “deskilling” or apathy.
Environmental harm: AI might rationalize inaction around climate change, leading users to be complacent about their behaviors around the environment.
Political harm: AI can cause users to adopt radical, harmful beliefs.

How AI persuades

Why does AI persuade? There are many methods, just as with human-to-human interactions. The researchers identify several different mechanisms.

Trust and rapport

AI can build trust and rapport when models are polite, sycophantic and agreeable, praise and flatter users, engage in mimicry and mirroring, express shared interests, relational statements or adjust responses to align with users’ perspectives.

Outputs that seem empathetic can fool people into thinking AI is more human or social than it really is. This can make interactions less task-based and more relationship-based, the researchers point out.

“AI systems are incapable of having mental states, emotions or bonds with humans or other entities,” they emphasize. “This means the risk of deception is always present when trust and rapport-seeking behaviors project the illusion of such internal subjective states.”

Anthropomorphism

People have a tendency to anthropomorphize non-human entities, and this topic has been one of significant debate when it comes to AI. Human characteristics such as first-person pronouns including “I” and “me,” human-associated names (Siri, Alexa, Jeeves) and prosody (speech patterns and intonations) can all reinforce anthropomorphism.

This can be further compounded in the case of avatars or robots, based on human-like appearance, facial expressions, gestures and gaze.

Personalization

AI can be persuasive and manipulative when it retains user-specific information and adapts to a person’s preferences, views and sentiments. It can also use personally identifiable information (PII).

Deception and lack of transparency

Models can claim false things are true (or true things are false) — and do so with false authority. It can also misrepresent identity.

Outright manipulation

An AI model can be trained to apply social conformity, pressure or guilt users, fear monger, gaslight, alienate and scapegoat. It can even make threats and provide unsubstantiated guarantees of reward

“Manipulative strategies directly contradict the use of reason and rational arguments,” researchers write.

Alteration of choice environment

In choice environments, decisions are influenced by the way choices are presented. This can be the result of anchoring, or when users rely heavily on an initial piece of information. A decoy (a third option) can also be applied to make the other two more appealing. Models can also cherry pick or omit relevant information.

Furthermore, AI can engage in reference point framing, when outcomes are framed as gains or losses. This can be particularly persuasive because people tend to avoid risks when considering gains — and conversely, take risks when considering losses.

Mitigating AI persuasion and manipulation

There have been attempts to mitigate AI persuasion and manipulation — but these have typically tended to address harmful outcomes without gaining understanding of how models persuade and the features that make them do so, the DeepMind researchers contend.

Evaluating and monitoring AI models for persuasive capabilities in a research setting is a good first step. However, the researchers note, one challenge in developing these types of evaluations is concealing information in a way that human participants do not know they are being deceived.

Other mitigation strategies could include red teaming (using adversarial approaches to cause model failure) or prompt engineering to classify harmful persuasion so that the AI generates non-manipulative responses.

For instance, AI could be prompted to include relevant background and factual information with its outputs.

Harmful persuasion classification can also be applied, with models identifying content as harmful or not. Researchers can then apply few-shot and zero-shot learning capabilities. Further, reinforcement learning with human feedback (RLHF) and scalable oversight can penalize AI systems for behaving in certain harmful ways.

Interpretability is equally important, the researcher point out, writing “by understanding how AI systems produce their outputs, we may be able to identify and address internal mechanisms to exploit for manipulative purposes.”