The next batch was chosen to be hard.

Searched PubMed for confusing types of studies.

pubmed searches, one per category

⌕mendelian randomization ⌕secondary analysis ⌕protocol paper ⌕pilot · feasibility ⌕stepped-wedge ⌕pragmatic trial ⌕cluster RCT ⌕vignette study ⌕target trial emulation ⌕cost-effectiveness alongside ⌕process evaluation ⌕baseline characteristics ⌕statistical analysis plan ⌕retracted RCT ⌕letter about a trial ⌕animal RCT ⌕IPD meta-analysis ⌕survey experiment ⌕commentary on trial ⌕economic evaluation

We named the kinds of paper that would share vocabulary with the screening question without actually being what we were looking for. Asked three external LLMs to brainstorm the same list — independently, each with the same prompt about which titles and abstracts would be hard to classify. The three lists overlapped heavily. For every category, we wrote a PubMed search and pulled ~75 titles and abstracts.

Running the codebook is cheap. Choosing the papers is slow. Curation is where the work went.

@karlrohe