How to craft a research codebook

What we’ve learned about writing codebooks when AI does the reading.

A codebook defines what you want to extract from your documents. Not just what questions to ask—but what counts as an answer, how to handle edge cases, what to do when the document is ambiguous.

Human coders accumulate context over time—through training sessions, conversations about edge cases, shared understanding that never gets written down. AI readers see only the document and your codebook. Nothing else. No memory of past extractions, no side conversations, no institutional knowledge.*

That constraint is clarifying. If the codebook is the only thing guiding extraction, it has to carry your full intent. Building one that does is iterative. The codebook is never right on the first cut.

It’s also good for science. Think about what usually happens: each lab develops its own conventions for coding—what counts as “reported,” how to handle ambiguous phrasing, which edge cases to lump together. Those conventions live in training sessions, in oral tradition, in the habits of whoever trained the coders. They affect the resulting dataset in ways that are real but invisible. When the codebook has to carry your full intent—all of it, written down—those hidden decisions become visible. Testable. Sharable. Another researcher can run your codebook and get your results, or run a different codebook and show exactly where the two diverge. That’s reproducibility, and it’s a reason minting is valuable beyond convenience.

On datamint.ing, you have an AI apprentice at each step of this process—one that highlights issues as they appear, proposes improvements, and does the extraction labor. The craft is yours: deciding what your codebook needs to say and how to handle what resists easy answers.

* This is by design. datamint.ing uses only AI providers that explicitly commit to not training on your data. Your documents and extractions remain yours—they are not used to improve any model. See our data policy.
Step 1

Draft your codebook

A codebook entry is a question you want to answer about each document, plus enough context that a reader who’s never talked to you could answer it consistently. What counts? What doesn’t? What should they do when the document doesn’t say?

Your codebook has two audiences. Humans will read it as the foundational definitions of your variables. AI readers will do the extraction labor. Writing for both is a skill.

The more precisely you describe what you want, the more consistent your extractions will be. But you don’t need precision on the first draft—the next steps show you where the gaps are.

On datamint.ing, a codebook assistant builds your first draft with you. Bring a protocol, a previous form, or just your goals. Over four interactive steps, it asks clarifying questions, proposes a skeleton of fields with scaffolding where needed, and drafts your codebook. Your feedback shapes every step.

Step 2

Run it, with multiple independent readers

The test for a codebook: give it to readers who can’t talk to each other. If they extract different answers from the same document, the codebook allowed it.

When a reader misses something, the most productive response is usually not “the reader is wrong” but “what in my instructions could I emphasize differently?” Sometimes the fix is a single sentence—a clarifying edge case, a new example. Sometimes it means rethinking how you asked the question.

This is the principle behind double extraction in systematic reviews. With AI readers, you can run this test in minutes instead of weeks—and you can run it every time you change the codebook.

On datamint.ing, consensus minting runs multiple independent AI readers against each document. An arbiter compares their extractions and identifies where they diverge.

Step 3

Inspect and resolve

Where do your readers disagree? What in your codebook led to it?

Every extraction carries an evidence trail—the text each reader relied on, their reasoning, any assumptions they made. You can see exactly why two readers diverged.

On datamint.ing, the arbiter works through each disagreement—what the readers each extracted, what passage in the document is relevant, and what in your codebook allowed two interpretations. It proposes ways to resolve it. The question is never “who was right?”—it’s “what should your codebook say?”

Step 4

Refine

That first round of questions surfaces your intentions. The next round turns them into specific edits to your codebook.

On datamint.ing, the refinement tool collects disagreements and your resolutions across your documents, then proposes specific improvements—a sharper description, a new example, a decision rule for an edge case. Each one comes with options and evidence from the extractions. You accept, modify, or skip.

Sometimes, though, the right fix isn’t a better description—it’s restructuring the question. You realize you’re asking the reader to do too much in a single field. Building scaffolding is the craft.

Say you want the estimated treatment effect for the primary outcome. That’s a compound task. The reader needs to identify the trial arms, the interventions each arm receives, something about the trial design—then find the right table, locate the estimate and its standard error, and decide which number to report. Good readers will work through these steps on their own. But when they do it implicitly, they sometimes take different paths—and you can’t see where they diverged.

Add scaffolding fields. Free-text fields before your main question that walk the reader through each step explicitly: identify the arms, describe the interventions, index the tables, flag which ones report the effect estimate. Each field references the previous ones, building a chain of reasoning that leads to your question of interest. The scaffolding may never appear in your final dataset, but it defines exactly what you mean by “estimated treatment effect”—and when readers disagree, you can see exactly where.

Give the reader permission to revise earlier scaffolding answers as they go. A rigid pipeline breaks when the document surprises you; a flexible one lets the reader update an earlier field once they’ve read further and know more.

Step 5

Re-extract and repeat

Run the updated codebook. Are the previous disagreements resolved? Have new ones appeared?

Two to three rounds is typical for tightening the core of your codebook. But the long tail of possible ambiguities is sometimes too long. Reality resists the rectangular grid of a spreadsheet.

Part of the craft is deciding how to handle that resistance. Maybe a field should capture free text so readers can explain their reasoning rather than force a choice. Maybe the ambiguity itself is something you need to study rather than resolve. These are research decisions, not extraction failures.

You don’t have to resolve every ambiguity before you start. Take consensus minting into production—the reader disagreements flag exactly where you need a human in the loop. The codebook handles the clear cases; your attention goes where it’s needed.

When the codebook captures your intentions well enough for the questions that matter to your project—mint your full document set. Every extraction carries an evidence trail: the text it came from, the reasoning, and the codebook that produced it.

On datamint.ing, you can share your codebook and your data. Others can mint from your codebook with their own documents, inspect your evidence trails, download your data, or study your codebook as a starting point for their own. Sharing is always optional—you can turn it off any time you change your mind.

Background

Define a codebook. Mint your data. Inspect every cell.