Give a group of Americans with high mathematical ability a statistics problem — one that requires valid causal inference from a data table. They perform well. Now give them the identical problem: same numbers, same table structure, same causal challenge. But reframe it — instead of a trial for a skin rash treatment, this is evidence about whether a city’s gun control ban reduced crime. The results invert. The most numerate subjects now show the largest errors, and the errors run in whichever direction confirms what they already believe about gun control.
This is Dan Kahan’s motivated numeracy experiment, published in Behavioural Public Policy in 2017. The finding is not that intelligent people are irrational. It’s something more uncomfortable: cognitive ability is not a corrective to motivated reasoning. It’s a resource for it. The more analytical power you have, the more effectively you can be wrong.
The protection that analytical ability provides against straightforward error does not extend to motivated error. It runs the other way.
Consider that you probably arrived at this article with a background assumption: whatever it finds, those findings describe other people’s failures. Motivated numeracy is for people who reason less carefully than you do. That assumption is worth noting. It is, itself, a data point.
So what, exactly, is the protection that education is supposed to provide?
The machine runs on itself
Before going any further, it’s worth being precise about what we mean by education. Not just credential accumulation — years of school, degrees, institutional stamps. What education does, and what it is designed to do, is develop and reward specific cognitive tools: analytical capacity, pattern recognition, narrative construction, domain confidence. These are not incidental byproducts of schooling. They are the product. The evidence that follows is about those tools in action, not about years of institutional attendance.
That precision matters, because the obvious defense — “I’m reasoning from evidence, not from identity” — is exactly what those tools make possible to believe while you’re doing the opposite.
Kahan’s study found that higher numeracy predicted correct answers on the skin cream problem. Switch to gun control, and the high-numeracy subjects didn’t stop performing better at inference — they redirected it. Liberals with high numeracy were more accurate when the data showed the ban had reduced crime; conservatives with high numeracy were more accurate when it showed the ban had increased crime. Each group’s analytical ability expressed itself in whichever direction its politics required. The tool was working. It just wasn’t pointing where they thought.
The directional flip is the key fact. Not that numeracy failed to help, but that it helped most in the politically convenient direction. The subjects weren’t choosing to deceive themselves — that’s not what this shows. What it shows is that intelligence redeployment is invisible from inside. When a high-numeracy conservative examined the gun control data and found reasons to discount it, they weren’t consciously cheating. They were doing what skilled analysts do: asking whether the sample was representative, whether confounding variables were controlled for, whether the causal mechanism was plausible. Legitimate questions. They became motivated when the same scrutiny wasn’t applied to data that confirmed what they already believed. The intelligence didn’t switch off. It switched targets.
Kahan found the same pattern on climate. A 2012 paper in Nature Climate Change examined whether higher science literacy and numeracy pushed people toward scientific consensus on climate risk. It did not. Among people with hierarchical and individualist cultural orientations, higher science literacy predicted greater dismissiveness of climate risk. Among those with egalitarian and communitarian orientations, it predicted greater alarm. The gap between the two groups widened as literacy increased. Knowledge wasn’t narrowing the divergence. It was amplifying it.
The mechanism was not ignorance — both groups had more knowledge and were doing more analysis. What differed was the direction it was applied, and that direction tracked political identity, not scientific evidence.
Chris Mooney synthesized this body of research in The Republican Brain (Wiley, 2012) and gave the phenomenon a name: “smart idiots.” The label is deliberately uncomfortable, and his observation matters: this is not hypocrisy, and it’s not stupidity. It’s mechanism. Motivated reasoning doesn’t diminish as cognitive ability increases. It becomes more fluent, more elaborated, and harder to detect from inside.
Here’s the bind that education specifically creates. It doesn’t only develop these cognitive tools — it invests identity in them. A person who has spent years building analytical expertise, and who has been rewarded by institutions, peers, and professional success for deploying it in particular directions, has more to lose from discovering the direction was wrong. The tools become harder to turn against the positions they were used to construct. That’s not a character flaw. It’s what happens when cognitive development and identity formation occur in the same institution, at the same time, rewarding the same conclusions. And it tightens with each success. The higher the citation count, the more established the reputation, the more prestigious the institution — the higher the cost of a framework-threatening discovery. The investment compounds.
What education produces — analytical inference, the habit of building coherent accounts from evidence, the confidence that comes from domain expertise — those capacities don’t come with instructions for when to stop trusting yourself. They are extraordinarily good at generating the subjective sensation of having reasoned correctly.
That sensation is the problem. It is not a signal of accuracy.
The expert’s curse
You might grant that political belief is tribal, while maintaining that it operates in a separate compartment from your professional judgment — that in your actual field, you reason correctly. This is the exit most educated readers reach for. Tetlock’s research closes it.
From 1984 to approximately 2003, Philip Tetlock recruited 284 experts — political scientists, economists, foreign policy analysts, current and former government officials — and had them make 82,361 predictions about political and economic events: who would win elections, whether economies would grow or contract, whether conflicts would escalate or resolve. Then he waited, and scored.
The result was embarrassing in the clinical sense: experts performed barely better than chance and no better than an informed non-expert reading newspapers. The quality of prediction correlated not with the depth of expertise but with its architecture. Experts who organised their worldview around a single explanatory framework — hedgehogs, in the terminology Tetlock borrowed from Isaiah Berlin’s 1953 essay — were the worst predictors of all. Foxes, who drew on multiple frameworks and updated when evidence arrived, did better.
Confidence and accuracy ran inverse to each other. The most committed, most certain experts were the least accurate. Not because they thought less rigorously — because their rigour was deployed in service of a framework they could not afford to have fail.
What distinguished the worst predictors wasn’t insufficient knowledge. It was the opposite: they knew their field well enough to have organised that knowledge into a coherent system, and the system had become the lens through which they perceived everything. A realist international relations theorist predicted conflicts through the logic of state power. A supply-side economist predicted downturns through the logic of incentive structures. Each framework had been built from years of evidence, which made it feel especially secure — and especially blind to what it was failing to capture. The more comprehensive the framework, the more each new event could be assimilated into it without challenge, and the less likely the framework was ever to crack.
A selection mechanism compounds this. Tetlock found that the qualities most valued by editors and television producers — decisive claims, willingness to commit, confident framing — are negatively correlated with predictive accuracy. The experts most sought after for commentary are, statistically, the most likely to be wrong. Hedgehogs are good television. Foxes equivocate in ways that make producers nervous. The institutional reward structure for public expertise therefore systematically elevates the worst performers — and pushes even the better ones toward confident commitment, making them worse over time.
The perversity is self-reinforcing. A forecaster who commits decisively and is sometimes right gets invited back; one who says “I’d put this at 63% with these caveats” gets replaced. The ecosystem doesn’t reward accuracy. It rewards the performance of authority.
In 2011, Tetlock and Barbara Mellers launched the Good Judgment Project under IARPA’s Aggregative Contingent Estimation program. A self-selecting group of volunteers — not domain experts — competed against professional intelligence analysts with access to classified information. The top performers, eventually called superforecasters, outperformed those analysts by roughly 30%.
The question that should bother you is not how that happened. It’s what those forecasters did that the experts didn’t.
What superforecasters actually did The practices documented in Tetlock and Gardner's Superforecasting (Crown Publishers, 2015) are not glamorous. Superforecasters kept written records of their predictions before events unfolded — so they couldn't revise their memory of having been right. They tracked accuracy against a record they couldn't edit. When wrong, they updated rather than explaining the disconfirmation away as an anomaly or exceptional case. They expressed probability as calibrated estimates — specific percentages directly comparable to outcomes — rather than directional commitments insulated from accountability. They broke large questions into smaller components where individual variables could be assessed separately. None of this requires exceptional intelligence. All of it requires the sustained willingness to be demonstrably wrong on a record you're keeping yourself. Which turns out to be extremely difficult to maintain when your professional and social identity is built around being the person who understands these things.
What the center got wrong
The standard defense of institutional expertise runs like this: yes, individual experts err — that’s what peer review, replication, and cumulative science are for. The system self-corrects. Individual overconfidence gets averaged out, outliers get challenged, bad findings don’t survive independent scrutiny.
Partially true. In some domains, sometimes, over long enough timeframes. But before accepting this, it’s worth asking: how long did the self-correction take, who paid the cost during the delay, and why did the field’s own internal processes fail to catch it first?
In 2015, Brian Nosek at the University of Virginia coordinated the Reproducibility Project. Two hundred and seventy researchers attempted to replicate 100 studies from three leading psychology journals — Journal of Personality and Social Psychology, Psychological Science, and Journal of Experimental Psychology: Learning, Memory, and Cognition. The results, published in Science, were stark: 36% replicated at the original significance threshold. Sixty-four percent failed. Social psychology fared worst, at approximately 25%. Effect sizes in the studies that did replicate averaged roughly half the original magnitude.
What drove this? Not fraud at the margins. Not a few bad actors. The project found the systematic product of normal operating logic: small samples treated as sufficient because results looked compelling; p-value thresholds gamed through repeated testing and selective reporting; a publication culture that rejected null results so they never entered the literature. Peer reviewers failed to catch this because they had been trained in the same methods, held the same priors, shared the same incentive structure. They were not an external check on the system. They were the system reviewing itself.
The sophistication that built the field generated the errors. The sophistication running review made those errors invisible from inside.
Economic forecasting operated under different pressures but produced an equivalent blind spot through a different mechanism. In July 2007, the IMF’s World Economic Outlook Update projected global growth of 5.2% for 2007 and 2008, with risks described as “modestly tilted to the downside.” Fourteen months later, Lehman Brothers collapsed. The IMF’s Independent Evaluation Office later concluded the Fund had done a poor job of flagging the crisis or its severity.
This wasn’t a failure of intelligence or diligence. Heterodox economists and analysts had been warning of systemic fragility for years. The problem was that those warnings weren’t processable inside the dominant framework. The consensus model had been built to describe a stable system and had no category for what they were describing. The external warning existed. It arrived in a language the institution couldn’t parse.
These two failures look different on the surface — internal incentive corruption in psychology, conceptual capture in macroeconomics. But the structure is the same. In both cases, the failure was not at the fringe. The fields most confident in their frameworks were most blind to what those frameworks generated. The center went wrong, and the center’s sophistication is what kept it wrong without knowing it.
The fat that wasn't killing you Ancel Keys was a University of Minnesota physiologist who spent the 1950s and 1960s building the case that dietary fat caused heart disease. His Seven Countries Study — a longitudinal investigation of diet and cardiovascular mortality across national populations, published as a monograph in 1980 (Keys et al., Seven Countries: A Multivariate Analysis of Death and Coronary Heart Disease, Harvard University Press) — became foundational. In 1968, the American Heart Association formalized the consensus into numbers: no more than 300 milligrams of dietary cholesterol per day, no more than three eggs per week. Critics noted at the time that Keys had been selective about which countries to include in the analysis, and that contradictory data from excluded nations had been dismissed rather than incorporated. Researchers who challenged the fat-heart hypothesis were marginalised — not through conspiracy, but through the normal operation of a consensus that had crystallised around a story supported by enough data to feel unassailable. The mechanism for excluding inconvenient evidence was not corruption. It was the settled confidence of a field that believed it had decided the question. In 2015, the Dietary Guidelines Advisory Committee dropped the longstanding limit on dietary cholesterol entirely, concluding that available evidence showed no appreciable relationship between dietary cholesterol and blood cholesterol levels. The reversal took roughly fifty years. The self-correction happened. Calling it self-correction is accurate. The timeline is what makes the word uncomfortable.
The plausibility trap
These failures — politically motivated inference that amplifies rather than corrects error, expert confidence running inverse to accuracy, institutional review systems that generated and perpetuated false findings, a dominant economic model that couldn’t encode the instability it was embedded in — share a common mechanism. It is not stupidity or corruption.
Education develops pattern recognition and narrative construction. Those capabilities produce exactly the class of wrong belief most likely to survive.
Obviously false beliefs — young earth creationism, flat earth, vaccine-autism causation — do circulate among educated people, but educated milieus are less efficient at producing and sustaining them. Not because education provides categorical protection, but because such beliefs fail surface plausibility tests quickly against accumulated knowledge, and maintaining them requires sustained motivated effort that educated institutions don’t typically reward. The beliefs that propagate most effectively in educated milieus are those structurally plausible enough to pass the same tests that catch the obvious failures. Education is better at screening one category of error than the other — and actively worse at the category it produces most naturally.
Pattern recognition generates confident inference from insufficient evidence. Often correctly. But it doesn’t come with an accuracy meter. The sensation of having recognised a pattern is not a reliable indicator of whether a pattern is there. Narrative construction connects evidence into internally coherent accounts, and coherence is what makes accounts feel true. A story that hangs together, explains multiple things in a single frame, resolves ambiguity — that feels like understanding. It may be. The feeling doesn’t tell you which.
The coherence problem is specifically worse for educated thinkers. Building a coherent account from complex evidence requires exactly the skills education develops. The more sophisticated the thinker, the more elements they can weave into a single explanatory frame, and the more satisfying the resulting account becomes — to them and to the sophisticated readers they communicate with. An elegant theory is more persuasive than a clunky one, regardless of which is more accurate. Elegance is a criterion for good writing and for careful thinking, but it is not a truth-tracking property of theories. Education teaches people to produce elegant explanations. It does not teach them to distrust elegance.
The social psychology findings that collapsed in the Reproducibility Project were not arbitrary or implausible. They were specifically, exactly, what a sophisticated observer of human behaviour would expect to find — which is precisely why they got published, cited, taught, built into theory, and incorporated into training programmes before anyone seriously checked.
Power posing: Carney, Cuddy and Yap published in Psychological Science in 2010 showing that two minutes in an expansive posture raised testosterone, lowered cortisol, and increased risk tolerance. The mechanism was compelling — body posture shaping hormonal state through feedback. Ranehill and colleagues ran a larger preregistered replication in 2015. No hormonal effects.
Ego depletion: Baumeister and colleagues published in 1998 showing that exerting self-control on one task depleted a resource that impaired performance on subsequent tasks — willpower as a fuel tank. In 2016, Hagger and colleagues coordinated a preregistered replication across 23 laboratories. Effect size essentially zero.
Social priming: Bargh, Chen and Burrows reported in 1996 that participants who read words associated with elderly stereotypes subsequently walked more slowly down a corridor. Doyen and colleagues ran a double-blind replication with automated timing in 2012. No effect — and an additional experiment found the slowing appeared only when experimenters expected it, suggesting the original result was an artifact of experimenter expectancy, not unconscious priming.
These studies spread because they were exactly as credible as sophisticated thinkers expected genuine phenomena to be. Plausibility was the mechanism of propagation. The field’s sophistication wasn’t protection against this. It was what made the propagation possible.
And note what didn’t catch these failures: smart readers, rigorous peer review, high-prestige journals, expert commentary from precisely the people best positioned to spot problems. What eventually caught them was preregistration, adversarial collaboration, and procedures specifically designed to create barriers against the motivated selection that sophistication makes so fluent. When plausibility is indistinguishable from truth, the only reliable check is structural, not intellectual.
What virtue actually costs
The obvious response is: be more humble. Think more carefully. Hold beliefs more loosely. These injunctions are correct the way “eat less, exercise more” is correct about body weight — accurate in the abstract, useless without a specific account of what that looks like in practice and why sustaining it is so difficult.
The Good Judgment Project provides a concrete one. Superforecasters are not domain experts, and they’re not exceptional intellects. What distinguishes them is a set of practices that most people — including most educated people — find very difficult to maintain. They record predictions in writing before events unfold. They track accuracy against a record they can’t edit. When wrong, they update — they don’t explain the disconfirmation away. They express probability as calibrated estimates rather than directional commitments. They break large questions into smaller trackable components.
None of this requires high intelligence. All of it requires the sustained willingness to be demonstrably wrong on a record you’re keeping yourself. Which is very difficult when your professional and social identity is invested in being the person who understands these things.
Kahan’s science curiosity research cuts finer. A 2017 paper in Political Psychology found that the polarising effect of science literacy — knowledge amplifying political divergence — didn’t apply to science curiosity, the disposition toward discovery and novelty. More scientifically curious people showed less polarised responses to politically charged scientific questions, regardless of existing political identity. Curiosity, unlike accumulated knowledge, is oriented toward the unknown. It doesn’t have the same material to defend.
The protective factor isn’t more knowledge. It’s a particular relationship with uncertainty — and that relationship has to be actively maintained, against the grain of institutions and identities that reward confident commitment and punish equivocation. The science curiosity finding suggests why: for the genuinely curious, a result that upends a prior belief is, at some level, interesting. For someone whose identity is invested in the prior belief, it’s threatening. Curiosity doesn’t guarantee accuracy either, but it preserves the channel through which disconfirmation can arrive. Closing that channel is exactly what expertise tends to do.
Tetlock’s hedgehogs were intelligent. They were people who had built intellectual identities around explanatory frameworks — around being the person who understood geopolitics through power dynamics, or economies through incentive structures, or political behaviour through rational choice. The framework wasn’t incidental to who they were. It was constitutive of it. So intelligence was deployed in defence of something that couldn’t afford to fail. The smarter you are, the more elaborate and airtight the defence you can construct.
That’s the specific risk education creates. Not ignorance. Not irrationality. The construction of internally consistent, well-evidenced, sophisticated commitment to something that happens to be wrong — with all the cognitive resources required to make that commitment durable, persuasive, and nearly invisible from inside.
You have now been warned. That probably won’t help as much as you’d hope — not because you’re not paying attention, but because the mechanism doesn’t require inattention. It runs on exactly the kind of careful, engaged, reasoning-from-evidence that you’ve been doing for the last fifteen minutes.
You have just read an argument that presented empirical evidence, built a coherent case, followed a logical sequence, and arrived at specific conclusions. It probably felt compelling. The Kahan experiment has a version for articles like this one: the argument is internally consistent, the evidence is verified, the conclusions follow — and the feeling of having read something well-reasoned is exactly what the article has now documented as an unreliable signal of truth.
This isn’t an invitation to dismiss what you just read. It isn’t relativism dressed as epistemics. The Reproducibility Project findings are real. Tetlock’s 82,361 forecasts are real. The IMF’s 2007 numbers are real.
But the appropriate response to a compelling argument is not confidence in it. It’s a calibrated estimate of its probability of being substantially correct, held alongside the possibility that it’s wrong in ways that will only become visible later. Superforecasters build systems for exactly this — not because they’re smarter than domain experts, but because they’ve accepted that the feeling of having reasoned well is not the same as having reasoned correctly. That position is genuinely uncomfortable. It resists the urge to file the argument and move on.
What would it actually look like to hold this argument with that kind of tentativeness — not performed scepticism, which is just another way of not thinking, but continuing to take the evidence seriously while treating the feeling of comprehension as something other than arrival?
That’s not a skill education gives you. You’d have to build it yourself, against the current.
Gen AI Haftungsausschluss
Einige Inhalte dieser Seite wurden mit Hilfe einer Generativen KI erzeugt und/oder bearbeitet.
Medien
Hartono Creative Studio – Pexels
Wichtige Quellen und Referenzen
Dan Kahan, Ellen Peters, Erica Cantrell Dawson & Paul Slovic, “Motivated numeracy and enlightened self-government,” Behavioural Public Policy, Vol. 1, Issue 1, pp. 54–86, 2017. https://doi.org/10.1017/bpp.2016.2
Dan Kahan, Ellen Peters, Maggie Wittlin, Paul Slovic, Lisa Larrimore Ouellette, Donald Braman & Gregory Mandel, “The polarizing impact of science literacy and numeracy on perceived climate change risks,” Nature Climate Change, Vol. 2, pp. 732–735, 2012. https://doi.org/10.1038/nclimate1547
Philip Tetlock, Expert Political Judgment: How Good Is It? How Can We Know? Princeton University Press, 2005.
Philip Tetlock & Dan Gardner, Superforecasting: The Art and Science of Prediction, Crown Publishers, 2015.
Open Science Collaboration, “Estimating the reproducibility of psychological science,” Science, Vol. 349, Issue 6251, 2015. https://doi.org/10.1126/science.aac4716
International Monetary Fund, World Economic Outlook Update, July 2007. https://www.imf.org/en/News/Articles/2015/09/28/04/53/sonew0725a
IMF Independent Evaluation Office, “IMF Performance in the Run-Up to the Financial and Economic Crisis: IMF Surveillance in 2004–07,” 2011.
Dana R. Carney, Amy J.C. Cuddy & Andy J. Yap, “Power Posing: Brief Nonverbal Displays Affect Neuroendocrine Levels and Risk Tolerance,” Psychological Science, Vol. 21, No. 10, pp. 1363–1368, 2010.
Eva Ranehill, Anna Dreber, Magnus Johannesson, Susanne Leiberg, Sunhae Sul & Roberto A. Weber, “Assessing the Robustness of Power Posing: No Effect on Hormones and Risk Tolerance in a Large Sample of Men and Women,” Psychological Science, Vol. 26, No. 5, pp. 653–656, 2015.
Roy F. Baumeister, Ellen Bratslavsky, Mark Muraven & Dianne M. Tice, “Ego Depletion: Is the Active Self a Limited Resource?”, Journal of Personality and Social Psychology, Vol. 74, No. 5, pp. 1252–1265, 1998.
Martin S. Hagger, Nikos L.D. Chatzisarantis et al., “A Multilab Preregistered Replication of the Ego-Depletion Effect,” Perspectives on Psychological Science, Vol. 11, pp. 546–573, 2016. https://doi.org/10.1177/1745691616652873
John A. Bargh, Mark Chen & Lara Burrows, “Automaticity of Social Behavior: Direct Effects of Trait Construct and Stereotype Activation on Action,” Journal of Personality and Social Psychology, Vol. 71, No. 2, pp. 230–244, 1996.
Stéphane Doyen, Olivier Klein, Cora-Lise Pichon & Axel Cleeremans, “Behavioral Priming: It’s All in the Mind, but Whose Mind?”, PLOS ONE, 2012. https://doi.org/10.1371/journal.pone.0029081
Chris Mooney, The Republican Brain: The Science of Why They Deny Science — and Reality, Wiley, 2012.
Dan M. Kahan, Asheley Landrum, Katie Carpenter, Laura Helft & Kathleen Hall Jamieson, “Science Curiosity and Political Information Processing,” Political Psychology, Vol. 38, Supplement 1, pp. 179–199, 2017. https://doi.org/10.1111/pops.12396
Ancel Keys et al., Seven Countries: A Multivariate Analysis of Death and Coronary Heart Disease, Harvard University Press, 1980.
Isaiah Berlin, The Hedgehog and the Fox: An Essay on Tolstoy’s View of History, Weidenfeld and Nicolson, 1953.




