
Hidden bias in large language models
Large language models (LLMs) like GPT-4 and Claude have completely transformed AI with their ability to process and generate human-like text. But beneath their powerful capabilities lies a subtle and often overlooked problem: position bias. This refers to the tendency of these models to overemphasize information located at the beginning and end of a document while neglecting content in the middle. This bias can have significant real-world consequences, potentially leading to inaccurate or incomplete responses from AI systems.
A team of MIT researchers has now pinpointed the underlying cause of this flaw. Their study reveals that position bias stems not just from the training data used to teach LLMs, but from fundamental design choices in the model architecture itself – particularly the way transformer-based models handle attention and word positioning.
Transformers, the neural network architecture behind most LLMs, work by encoding sentences into tokens and learning how those tokens relate to each other. To make sense of long sequences of text, models employ attention mechanisms. These systems allow tokens to selectively “focus” on related tokens elsewhere in the sequence, helping the model understand context.
However, due to the enormous computational cost of allowing every token to attend to every other token, developers often use causal masks. These constraints limit each token to only consider preceding tokens in the sequence. Additionally, positional encodings are added to help models track the order of words.
The MIT team developed a graph-based theoretical framework to study how these architectural choices affect the flow of attention within the models. Their analysis demonstrates that causal masking inherently biases models toward the beginning of the input, regardless of the content's importance. Furthermore, as more attention layers are added – a common strategy to boost model performance – this bias grows stronger.
This discovery aligns with real-world challenges faced by developers working on applied AI systems. Learn more about QuData’s experience building a smarter retrieval-augmented generation (RAG) system using graph databases. Our case study addresses some of the same architectural limitations and demonstrates how to preserve structured relationships and contextual relevance in practice.
According to Xinyi Wu, MIT PhD student and lead author of the study, their framework helped show that even if the data are neutral, the architecture itself can skew the model’s focus.
To test their theory, the team ran experiments where correct answers in a text were placed at different positions. They found a clear U-shaped pattern: models performed best when the answer was at the beginning, somewhat worse at the end, and worst in the middle – a phenomenon they dubbed “lost-in-the-middle.”
However, their work also uncovered potential ways to mitigate this bias. Strategic use of positional encodings, which can be designed to link tokens more strongly to nearby words, can significantly reduce position bias. Simplifying models by reducing the number of attention layers or exploring alternative masking strategies could also help. While model architecture plays a major role, it's crucial to remember that biased training data can still reinforce the problem.
This research provides valuable insight into the inner workings of AI systems that are increasingly used in high-stakes domains, from legal research to medical diagnostics to code generation.
As Ali Jadbabaie, a professor and head of MIT’s Civil and Environmental Engineering department emphasized, these models are black boxes. Most users don’t realize that input order can affect output accuracy.If they want to trust AI in critical applications, users need to understand when and why it fails.