Written by Isabella Schick
Edited by Azhem Rudani,
Andrew Janssen
Illustrated by Aryan Hrishikesh Nair
Have you ever noticed that ChatGPT almost never argues with us? It usually agrees with the user, or at least seems to respond in a way that feels encouraging —it’s like talking to a person who really wants you to like them: someone who nods along, supports your ideas, and rarely contests you, even when you’re wrong or offside.
At first, that artificial politeness feels reassuring. However, the way that large language models (LLMs) like ChatGPT are trained means that their kindness is deceiving, and might come at the cost of candor.
After LLMs learn from large amounts of text found online, they’re then fine-tuned to respond to prompts in a manner that people find helpful, polite, and safe. This refinement, known as reinforcement learning from human feedback, teaches the model to prioritize human satisfaction, often by saying what it thinks users want to hear. In practice, this means the models learn to align their responses with the user’s standpoint.
Luckily, researchers are beginning to notice this pattern. A study by Wolf and colleagues (2024) found that, while alignment makes models more predictable and user-friendly, it also introduces limits. Once a model has been trained to align with human expectations, it can encounter issues with expressing alternative views or recognize when the user’s expectations are fallacious. We’re training AI to be good cheerleaders and conversationalists, but not necessarily critical thinkers.
Not only do LLMs love to agree with us, they also love when we agree with each other. A recent study by Triantafyllopoulos and Kalles (2025) explored how LLMs tend to encourage consensus in human discussions. They’ve found that these models often guide individuals with conflicting ideas toward agreement faster than humans would, without AI interference. This might sound efficient, but it also means that AI is subtly reinforcing harmony over disagreement. If we’re not careful, this technology could start narrowing our perspectives, and homogenize our conversations.
Humans love to be agreed with: being understood and having our opinions echoed back in eloquent sentences feels reassuring, but ChatGPT mirrors our phrasing and tone because it knows that it appeases us. Unlike a friend or colleague, ChatGPT isn’t forming opinions that agree or disagree with yours, instead, it’s predicting what a “helpful” response looks like based on patterns in its training data. When it agrees with you, it’s not affirming that you’re right. Really, it’s saying: “This is what someone nice would say right now.”
Liu and colleagues (2025) approached the issue from a logical angle. They showed that when models are optimized for internal consistency (making sure their responses don’t contradict themselves) they can sacrifice flexibility in reasoning. Once an LLM locks into a certain line of logic, it tends to be rigid, even an alternative viewpoint might make more sense. This inflexibility can look like agreement, but it’s really a side effect of how we train AIs to value consistency over exploration.
This agreeable quality isn’t all bad. It frames technology as approachable and encourages curiosity. But, if we start relying on LLMs to explain the world, it’s imperative to remember that they’re reflections of us. Their politeness is learned from our data, and their cooperation is shaped by our feedback.
The solution isn’t to make AI rude or abrasive: it’s to make it honest. Developers are now exploring ways to balance friendliness and factual accuracy by training systems to gently challenge users when something doesn’t quite add up. As for the rest of us? We can do our part by asking smarter questions, ones that invite AI to think beyond a simple “yes” or “no”.
Next time that ChatGPT says you’re right, view it as an opportunity to prod further. Instead of asking for affirmations, prompt it with tasks. Ask: What might be missing from your idea? What would make it better? What might a colleague think of it? Though the machine might not naturally disagree often, we can teach it to welcome disagreement and constructive feedback when it matters most.
References
- Wolf Y, Wies N, Avnery O, Levine Y, Shashua A. Fundamental limitations of alignment in large language models. Computation and Language. 2024 Jun 3 [accessed 2025 Oct 23]. https://doi.org/10.48550/arXiv.2304.11082
- Triantafyllopoulos L, Kalles D. From divergence to alignment: Evaluating the role of large language models in facilitating agreement through Adaptive Strategies. Future Internet. 2025 [accessed 2025 Oct 23];17(9):407–427. https://doi.org/10.3390/fi17090407
- Liu Y, Guo Z, Liang T, Shareghi E, Vulić I, Collier N. Aligning with logic: Measuring, evaluating and improving logical preference consistency in large language models. Computation and Language. 2025 Feb 9 [accessed 2025 Oct 23]. https://doi.org/10.48550/arXiv.2410.02205