In short: Before you compare two cohorts, check what changed in how you measured them. Benchmark both on the same metrics, normalise for cohort size and program length, name every confound that could explain the gap, and grade the comparison green, amber, or red — so a raw delta is never mistaken for a real improvement.
1 · Set up over your data
Start where both cohorts live in one clean dataset. This walkthrough uses DEMO-03 · Workforce Cohort — Vista Workforce Collaborative, with a 2023 and a 2024 cohort and persistent IDs. Load your Decision Brief first so the comparison is framed by the decision it informs.
You are the Sopact Sense Assistant working over the DEMO-03 · Workforce Cohort dataset (clean data + persistent contact IDs). Load my Decision Brief (decision, audience, outcomes, indicators, evidence standard) first, then wait for my task.
2 · Write the comparison prompt
The prompt benchmarks the cohorts and forces the confounds into the open.
Benchmark cohorts [COHORTS] on [METRICS], normalising for size and program length; report deltas + confounds. Grade green/amber/red.
Five elements keep it honest: the dataset holding both cohorts, the normalise step for size and program length, the confounds that could explain a gap, the rule that delta≠improvement until confounds are ruled out, and the grade G/A/R on how trustworthy the comparison is.
3 · What Sense produces
Run on the Workforce Cohort dataset (DEMO 03) already loaded in Sopact Sense.
GRADE: green | 2023 baseline | solid, well-measured; amber | 2024 window | shorter program length; red | Raw delta | confounded by cohort and window
The grade tells you how much the gap is worth. Green means a clean reference — the 2023 baseline is well measured and comparable. Amber means a normalisation issue — the 2024 cohort ran a shorter window, so its outcomes need adjusting before comparison. Red means the headline number is confounded — the raw delta mixes a real effect with cohort and window differences, so reporting it as improvement would mislead.
4 · Turn a weak link green
Fix the confounded comparison with the smallest realistic change.
Take the lowest-graded element above and fix it using only what the program could realistically measure. Show the before → after grade and the single indicator/edit that moves it to green.
5 · Make the report and share
Produce a branded "missing & incomplete" report and a shareable link.
Create a 'missing & incomplete' report from this analysis in Sopact branding [or paste your website URL / brand guideline to apply your own]. List every element graded amber or red, what is missing, and the one input that fixes each. Lead with the decision this report informs.
Create a shareable link for this report and open it in a new tab.
Tricks, tips, and troubleshooting
Check measurement before you compare. If the survey, timing, or population changed between cohorts, the gap may be an artefact of how you measured, not what happened.
Normalise size and length first. A bigger or longer-running cohort will look different for reasons that have nothing to do with program quality. Adjust for both before reading the delta.
Name the confounds out loud. A confound you've written down is one a reader can weigh. A confound you've ignored quietly turns a delta into a false claim.
Report the delta with caveats, not as a verdict. Until confounds are ruled out, a delta is a question, not an improvement.
List every confound between these two cohorts and, for each, whether it inflates or deflates the raw delta.
Frequently asked questions
How do I compare outcomes between two cohorts?
Benchmark both cohorts on the same metrics, normalise for differences in size and program length, then report the delta alongside every confound that could explain it. The goal is to separate a genuine program effect from differences in who was in each cohort and how long they were measured — and to grade how confident you can be in the comparison.
What is a confound, and why does it matter here?
A confound is any factor that differs between the cohorts and could move the outcome on its own — a shorter program window, a different intake mix, a changed survey question. It matters because it offers a rival explanation for the gap, so an unacknowledged confound can turn a measurement artefact into a claimed improvement.
Why isn't a raw delta the same as an improvement?
A raw delta only tells you the two cohorts scored differently, not why. Until you've normalised for size and length and ruled out confounds, that difference could come from the program, the people, or the measurement — so reporting it as improvement overstates what the data supports.