Every question has a bias and variance.
Each question is a measurement. Two ways a measurement can fail.
misalignment
readers agree on the wrong thing
→fix: rewrite definitions
disagreement
readers diverge paper-by-paper
→fix: add scaffolding; sharpen examples
A useful decomposition of error, not a literal MSE. Both quantities depend on the corpus you run it on.
A clinical-review field that breaks both ways: allocation_concealment_adequate
high biasCodebook says look for "double-blind." Readers agree — on the wrong construct. Blinding is not allocation concealment.
high varianceCodebook names the right construct. Papers describe it with scattered cues ("central randomization", "pharmacy-controlled sequence", appendix-only details). Readers diverge case-by-case.
cf.Cohen (1960) · Cronbach & Meehl (1955) · Krippendorff (2004) · Shrout & Fleiss (1979) · Cronbach et al. (1972) G-theory · Lundberg & Stewart (2021)
Inter-reader disagreement is variance, made visible.