Data Extraction Forms: What to Include and How to Design Them
Data extraction is where good systematic reviews are made or broken. A weak form produces inconsistent data; inconsistent data produces synthesis you cannot trust. But over-engineered forms — fifty fields no reviewer fills in consistently — fail in the opposite direction. This article covers what to actually include on an extraction form, how to pilot it, and how to run dual extraction without losing your mind.
What extraction is for
Extraction converts each included study into structured data. Done well, you end up with a dataset you can synthesize narratively, tabularly, or statistically (via meta-analysis). See our data extraction process page for the full method.
The Cochrane Handbook (Chapter 5) and JBI methodology both specify extraction as a dual-reviewer process for systematic reviews. Extraction is not optional. If you did not extract, you did not review.
Core fields every form needs
Every extraction form — regardless of review type — should include:
- Study ID (first author + year + letter for multiple reports, e.g., Smith 2020a)
- Full citation
- Country and setting
- Study design
- Aim or research question
- Population (inclusion criteria, sample size, demographics)
- Intervention or exposure (for intervention reviews)
- Comparator (for comparative designs)
- Outcomes measured (with measurement tool and time point)
- Results (effect estimates with confidence intervals; for qualitative, key themes)
- Funding source and declared conflicts
- Risk of bias judgments (per the tool you are using)
- Reviewer notes (anything unusual, ambiguous, or requiring clarification)
Use our data extraction form template as a starting point.
Add review-specific fields
Beyond the core, add fields that match your synthesis plan:
- Meta-analysis: effect size, variance, sample size per arm, outcome scale, time point, adjustment variables
- Qualitative synthesis: analytic approach, theoretical framework, raw themes, author interpretations
- Scoping review: concepts mapped, gaps noted, stakeholder involvement
- Intervention fidelity review: dose, duration, setting, provider training
Do not include fields "in case we need them." If you have no plan to use a field in synthesis, cut it.
Form format: paper, Word, Excel, or software?
- Paper or Word: only for pilot and single-study examples. Unusable at scale.
- Excel: fine for small reviews (< 40 studies), cheap, portable, no vendor lock-in.
- Covidence or EPPI-Reviewer: purpose-built, supports dual extraction with automatic conflict flagging, recommended for most systematic reviews.
- REDCap: excellent for large teams; requires institutional setup.
Choose based on review size and team size, not familiarity.
Pilot before you extract
Pilot the form on three to five included studies. Both reviewers extract the same studies independently. Then:
- Compare every field, every study
- Where you disagreed, diagnose: was the field ambiguous? The source data ambiguous? The operational definition loose?
- Revise the form: rename fields, add response options, tighten definitions
- Re-pilot if the first round produced major disagreement
A two-hour pilot saves two weeks of re-extraction.
Dual extraction in practice
For systematic reviews, the Cochrane Handbook requires dual extraction: two independent reviewers extract each study, then compare and reconcile. In practice:
- Both reviewers extract independently into their own forms
- A third reviewer (or the original two together) compares each field
- Disagreements are resolved by discussion, consulting the full text, or consulting a third reviewer
- Final locked form is the reconciled version
Document every resolved disagreement. Meta-review of your extraction process can catch systematic errors you did not notice.
Extracting results: the hardest part
Results extraction is where most errors happen. Practical tips:
- Record what the paper reports, not what you wish it reported
- If the paper reports a median and IQR and your synthesis needs mean and SD, note both and flag the need to convert
- Extract outcomes by time point, not averaged across time points
- When an intention-to-treat and per-protocol analysis are both reported, extract both and note which you will use
- Record any non-reported data as "not reported" — never leave blank
Contacting authors
For missing data critical to synthesis, email the corresponding author. Keep the email short, specific, and polite:
Dear Dr. X, I am leading a systematic review of [topic]. Your 2022 paper reported [outcome] at 12 weeks. Could you share the standard deviation for the intervention arm? I will acknowledge the data in our review.
Expect a 30–50% response rate. Document every attempt and outcome in your PRISMA flow.
Five design principles
- One question per field. "Population and setting" is two fields.
- Use closed response options where you can. "Design: [RCT / quasi-experiment / cohort / case-control / ...]" beats a free text box.
- Define every field operationally. Attach a one-line definition and an example.
- Order fields to match the paper's order. Usually: methods, sample, intervention, outcomes, results.
- Leave a comments field per section. Reviewers will have context that does not fit the structure.
A form that takes 15 minutes per study is achievable with pilot and practice. A form that takes 45 minutes per study usually has too many fields or too-loose definitions — and is probably also producing inconsistent data.