July 21, 2025

Does AI struggle with its confidence?

In a new series of experiments, researchers from Google DeepMind and University College London have discovered that large language models (LLMs) like GPT-4o, Gemma 3, and o1-preview struggle with an unexpected dual challenge: they are often overconfident in their initial answers yet become disproportionately uncertain when faced with opposing viewpoints.

LLMs are at the core of today’s artificial intelligence systems, enabling everything from virtual assistants to decision-making tools in healthcare, finance, and education. Their growing influence demands not only accuracy but also consistency and transparency in how they reach conclusions. However, the new findings suggest that these models, while advanced, don’t always operate with the rational precision we assume.

At the heart of the study is a paradox: LLMs tend to stick stubbornly to their first response when reminded of it, showing what researchers call a “choice-supportive bias.” Yet, paradoxically, when their answers are challenged – especially with opposing advice – they frequently lose confidence and change their minds, even when that advice is flawed.

To explore this, the researchers devised a unique two-step testing framework. First, an LLM would answer a binary-choice question, such as determining which city is farther north. Then, it would receive “advice” from another LLM, with varying levels of agreement and confidence. Finally, the original model had to make a final decision.

A key innovation in the experiment was controlling whether the LLM could "see" its initial answer. When the initial response was visible, the model became more confident and less likely to change its mind. When hidden, it was more flexible, suggesting that memory of its own answer skewed its judgment.

The research paints a picture of LLMs as digital decision-makers with very human-like quirks. Much like people, they display a tendency to reinforce their initial choices even when new, contradictory information emerges – a behavior likely driven by a need for internal consistency rather than optimal reasoning.

Interestingly, the study also revealed that LLMs are especially sensitive to contradictory advice. Rather than weighing all new information evenly, models consistently gave more weight to opposing views than to supportive ones. This hypersensitivity led to sharp drops in confidence, even in correct initial answers.

This behavior defies what’s known as normative Bayesian updating, the ideal method of integrating new evidence in proportion to its reliability. Instead, LLMs overweight negative feedback and underweight agreement, pointing to a form of decision-making that is not purely rational, but shaped by internal biases.

While earlier research attributed similar behaviors to "sycophancy" – a model’s tendency to align with user suggestions – this new work reveals a more complex picture. Sycophancy typically leads to equal deference toward agreeing and disagreeing input. Here, however, the models showed an asymmetrical response, favoring dissenting advice over supportive input.

This suggests two distinct forces at work: a hypersensitivity to contradiction that causes sharp shifts in confidence, and a choice-supportive bias that encourages sticking with prior decisions. Remarkably, the second effect disappears when the initial answer comes from another agent rather than the model itself, pointing to a drive for self-consistency, not just repetition.

These findings have significant implications for the design and deployment of AI systems in real-world settings. In dynamic environments like medicine or autonomous vehicles – where decisions are high-stakes and subject to change – models must balance flexibility with confidence. The fact that LLMs may cling to early answers or overreact to criticism could lead to brittle or erratic behavior in complex scenarios.

Moreover, the parallels with human cognitive biases raise philosophical and ethical questions. If AI systems mirror our own fallibilities, can we ever fully trust them? Or should we design future models with mechanisms to monitor and correct for such biases?

The researchers hope their work will inspire new approaches to training AI, possibly beyond reinforcement learning with human feedback (RLHF), which may inadvertently encourage sycophantic tendencies. By developing models that can accurately gauge and update their confidence, without sacrificing rationality or becoming overly deferential, we may come closer to building truly trustworthy AI.

Read the full study in the article “How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models”.

AI/ML News

Does AI struggle with its confidence?