highlight · part i · 2

AI: bad at judgment,
great at reading.

For your research

You want to extract data from many documents — screen abstracts, code findings, assess bias. You want AI to help. But almost every task is judgment. Even simple extractions often hide expert judgment, and AI is bad at judgment.

Your task

To make systematic judgments, you need to decompose them into a sequence of reading comprehension tasks. Each reading task is one key step in the judgment. It takes skill. A codebook is where you write it down — and it is the prompt you give to the reader.

The good news

This is not new. It is not about AI. Complex judgments become more reliable when they are decomposed into simpler, observable parts.

JUDGING

AI collapses

READING

AI thrives

Is the sample size adequate?
Did the authors describe a power calculation for their planned sample size?
Is the coding reliable?
Did the authors report an inter-rater agreement statistic?
Is the effect clinically meaningful?
Did the authors define a minimum clinically important difference?

a pattern across disciplines, long before AI

  • Meehl (1954)clinical diagnosis, decomposed into weighted rules on observable traits.
  • Bueno de Mesquita (1981)conflict outcomes, decomposed into four estimated variables.
  • Altman (1994)medical research quality, reframed as checkable technical obligations.
  • Gawande (2009)surgical expertise, decomposed into pre-op checklist items.
  • Sterne et al. (2019)RoB 2: risk of bias, decomposed into five domains of signaling questions.
  • Kahneman et al. (2021)noisy judgment, decomposed via decision hygiene.

The codebook is the bridge from reading to judgment.

cf.Descartes (1637) · Polya (1945) · Meehl (1954) · Simon (1962) · Alexander (1977) · Bueno de Mesquita (1981) · Altman (1994) · Richardson (1995) · Begg et al. (1996) · Kahneman & Frederick (2002) · Gawande (2009) · Tetlock (2015) · Sterne et al. (2019) · Kahneman et al. (2021)

@karlrohe