Validation is trust, built in layers.
No single layer is sufficient. Seven we build on, at Data Mint.
A kappa score doesn't make you trust the data. The trust is in the codebook behind it — and in the readers you chose for this codebook, on documents drawn from a corpus you know.
-
01
The codebook.
Readable, inspectable. The decision rules, written down.
-
02
Evidence trails for every extracted value.
Quote, reasoning, assumptions — auditable per cell.
-
03
Multiple independent readers.
Agreement earned, not assumed. Disagreement exposes ambiguity.
-
04
Codebook refinement.
Run. See where readers diverge. Fix what broke. Repeat — disagreements thin out with each pass.
-
05
The mint grade.
A signal built from process traces throughout minting, calibrated against reader agreement — and, soon, against human-labelled ground truth.
-
06
Shared codebooks.
Replication through reuse. A layer you can't produce alone.
-
07
Accumulated experience.
You learn what right looks like. And wrong. Over time, you know when to trust.
No statistic validates a codebook. Use does.
Data Mint is where that trust gets built — layer by layer.