play icon for videos

Impact Measurement Software - New Architecture In AI Age

What impact measurement is, how it works, and the frameworks behind it — from outputs to outcomes to evidence funders can verify. Practical guide, since 2014.

Updated
June 16, 2026
360 feedback training evaluation
Use Case
Use Case · Impact Measurement

Impact measurement as a workflow — not a separate report.

Funders ask for outcomes, grantees and investees carry the cost, and almost no one funds the capacity to build it — so measurement collapses into a report written once a year to satisfy the ask. The fix is to stop treating impact measurement as a separate activity and let it ride the workflow you already run — application intake, training delivery, grant reporting, investee monitoring — with context captured as the work happens, and the data dictionary built along the way.

2014

The day job since — before the category had a name

30,000+

Practitioners on the underlying framework

5 / 95

Survey signal vs. the context that lives elsewhere

1 record

Per person — every figure cites its source

Definition

Start with the question, not the dashboard.

What is impact measurement?

Impact measurement is the practice of determining whether a program moved the people it serves on the outcomes it promised. It joins numbers — survey scores, attendance, cost — with stories — case notes, transcripts, reflections — on one participant ID, so every claim traces to a source record instead of a slide.

What is impact measurement and management (IMM)?

Impact measurement and management (IMM) is impact measurement plus the decisions that follow from it. Measurement asks what changed; management uses that evidence to redesign the program, move funding, and report to boards and funders. Both run on the same connected record — you cannot manage what you have not measured to a source.

01 Intake 02 Baseline 03 In-program 04 Outcome 05 Evidence 06 Funder & board

The distinction that decides everything

An output is what you did. An outcome is what changed.

A decade of tooling reported outputs as if they were outcomes. The board hears "1,500 served" and still cannot ask whether anyone improved.

Output

What the program delivered. Counts of activity — easy to log, easy to chart, and silent on whether the activity worked.

Sessions deliveredStudents servedAttendance loggedDocuments filed
+ Outcome

What changed for the participant. Movement on the thing the program promised — measured before and after, on the same person, with the story that explains it.

Confidence improvedEmployment foundWell-being movedHousing sustained

The 5/95 gap — where the evidence actually lives

5%
95% — case notes, transcripts, financials, stories
Survey signal — the part most dashboards report The context that explains it — usually disconnected

Measurement as workflow, not a separate activity

Nobody funds a separate report. Everybody already runs a workflow.

Each workflow has its own stages — they are not interchangeable. The point is that the measurement falls out of the stages a team already runs, and every source binds to one record per stakeholder. Four workflows, four different stage sequences, one architecture underneath.

Accelerator & application

The base pattern

Application Selection Pre Mid Post

An AI follow-up loop captures context at pre, mid, and post — the same applicant ID carried from application through the program, no second system to stand up.

Application formReviewer scoresFollow-up surveys

Training & implementation

Different stages · adds sources

Enrollment Delivery Mentor check-ins Pre / post assessment

Mentor feedback and LMS activity join the record — attendance and assessment become movement on the outcome, not just completion logs.

Mentor feedbackLMS dataPre/post assessment

Grant management

Different stages · adds sources

Onboarding Disbursement Semi-annual report Renewal

Grantee metrics and the narrative reports bind to one grantee record — the semi-annual report becomes a view of the record, not a rebuild from scratch each cycle.

Grantee metricsSemi-annual reportsSite-visit notes

Impact investing

Different stages · builds the dictionary

Diligence Investment Quarterly monitoring Portfolio roll-up

The data dictionary forms across the portfolio — every metric defined once, comparable across investees instead of re-keyed per fund report.

Investee metricsQuarterly updatesPortfolio dictionary

Different stages, one architecture. The measurement is captured inside the work each team already does, bound to one record per stakeholder, with the data dictionary built along the way. There is no separate measurement project to fund, and nothing to reassemble at report time. The push and pull over who pays for outcomes ends when the measurement is the workflow.

The reframe

Impact measurement, in its traditional form, is dead.

It was always the funder's ask and the grantee's burden. The survey returned about 5% of the context — the case worker's notes, the audio reflection, the financial document, the parent voice held the other 95%, and almost no platform brought them together under one record per person. Two models dominated the last decade. Both became liabilities.

Asset 1 · became a liability

Capacity-building consulting

The dominant model since 2014: a vendor with consultants would help the nonprofit "build measurement capacity," sold as a long-term partnership for outcomes maturity.

The consultants left. The capacity rarely transferred.
The report became the deliverable, not the evidence behind it.
By the next funder cycle, the work was rebuilt from scratch.

Asset 2 · became a liability

Activity tracking

Case management platforms — Apricot, ETO, SureImpact, Salesforce Nonprofit Cloud — sold "track every interaction." Click attended, click didn't, document the encounter.

The system counted attendance and documentation, not movement on outcomes.
Case notes that held the evidence sat in narrative fields, unsearchable.
"Did they improve?" got a slide and a story — never one connected record.

The deeper problem

Prompt your way to a cohort summary? Sure. Run it twice and get the same answer? Different problem.

!

Ask a foundation model the same question twice on the same data and you get two different answers — this is how the models work, not a bug to fix. As the dataset grows, hallucination climbs: on Vectara's 2026 enterprise-document benchmark, every major reasoning model — GPT-5, Claude Sonnet 4.6, Grok-4, Gemini-3 Pro — fabricated information in more than 10% of summaries. The fix is not a better prompt. It is structural — and you cannot bolt it onto Apricot, ETO, SureImpact, Blackbaud Outcomes, or Salesforce NPSP.

Pillar 01

Longitudinal — one ID for years

The student enrolled at age 7 is the same record at age 17. The ID does not break across program redesigns, staff turnover, schema changes, or vendor migrations.

Pillar 02

Numbers and stories on one record

Before-and-after scores sit on the same record as case notes, audio reflections, exit transcripts, and program cost. The reasoning layer reads the qualitative material as evidence, not decoration.

Pillar 03

Every figure cites its source

Every figure in a funder report and every theme in a board view cites the specific case note, transcript, response, or ledger entry it came from. Verify in one click, not one quarter.

This has been Sopact's day job since 2014 — before the generative AI category had a name.

Best practices

The shape of an impact measurement cycle.

The generic stages every program shares — distinct from the four concrete workflows above. One stable participant ID survives every join, from intake through the funder report.

01 · Intake

Stable participant ID

Application or enrollment form. The CRM contact lands with one ID that survives every later join.

02 · Baseline

Pre-program signal

Before-program scores on the questions your program already uses, intake transcripts, caseworker observations — all bound to the participant.

03 · In-program

Ongoing evidence

Case notes, attendance, mid-program survey, audio reflections, mentor feedback. The 95% the dashboard usually misses.

04 · Outcome

Post-program signal

Post survey, exit interview, employment or wage data, 6 and 12-month pulse — joined on the same participant ID.

05 · Evidence

Roll-up with citations

Cohort movement, qualitative themes from case notes, cost-per-outcome from accounting. Every figure cites a source record.

06 · Funder & board

Narrative + funder-ready report

Funder report, board view, custom roll-ups — produced from one connected record, not rebuilt from spreadsheets each cycle.

Frame · how Sopact fits the existing stack

Sopact connects. It does not replace your CRM, case management, or accounting.

Most teams already run a CRM, a case management system, an intake tool, an accounting platform, and a reporting layer. Sopact Sense sits in the middle and holds one connected record per person — contacts flow in, evidence flows out.

Data in · identity & intake

Contacts & forms flow in

  • HubSpot · Salesforce
  • Apricot · ETO · NPSP
  • Google Forms · JotForm · Typeform
  • SurveyMonkey · Qualtrics · Kobo

One record per person

Sopact Sense

  • Stable participant ID across years
  • Numbers + stories on one record
  • Frameworks your funders ask for
  • Citation trail behind every figure

Data out · reports & ledgers

Evidence flows out

  • QuickBooks · Xero · Sage Intacct
  • Looker Studio · Tableau · Power BI
  • Funder PDF · custom outcome report
  • Board view · manager dashboard

Frame · the Tuesday question, not the year-end dashboard

The real job is the Tuesday question — answered two ways.

The legacy stack was tuned for the year-end report. The impact manager's real job is the question a program officer asks on a Tuesday. Five of them, shown both ways.

"Did our cohort improve on the outcomes we promised, or did we only track who showed up?"

Sopact · one query

47 enrolled, 39 reached post (83%), confidence +2.3 pts (n=37), 12 themes correlated with movement, 6 outliers worth a call — citations on every figure.

Legacy · four weeks

Pull attendance from Apricot, surveys from SurveyMonkey, export to Excel, hire an M&E consultant in Q4. Repeat next cycle.

"What do the case notes tell us that the survey misses?"

Sopact · read the 95%

11 case-note records contain "classroom anxiety" language; 7 show movement, 4 do not — and name a blocker the survey never measured.

Legacy · read the 5%

The survey gave a Likert score. Case notes sit in a free-text field, unsearchable. The board hears one anecdote; the pattern is invisible.

"Of the 47 who started, how many still move at 6 months — and what do dropouts share?"

Sopact · longitudinal on one ID

31 of 47 reachable, 24 still moving, 7 plateaued; the 16 lost share two case-note themes the program design missed.

Legacy · rebuild the cohort

Survey IDs do not match the case-management IDs. Re-key the cohort, email participants, 12 reply. "Longitudinal tracking proved difficult."

"Can we cut this funder report from six weeks to six hours, without rebuilding the data?"

Sopact · one connected record

The framework is referenced, not pasted; cost-per-outcome pulls from QuickBooks; citations sit under every number. The writer interprets, not assembles.

Legacy · reassemble each cycle

Survey export, case-management export, manual reconciliation by name, separate accounting pull, copy-paste into Word. Six weeks.

"What evidence-based story do we tell the board besides 'we served X people'?"

Sopact · outputs become evidence

How many entered, how many moved, on which outcomes, at what cost-per-outcome — with citations to the notes that explain movement and dropouts.

Legacy · a slide on a number

"We served 1,500 students," a photo, one pulled quote, an attendance bar chart. No way to ask the question behind the number.

The anatomy of a roll-up

From one answer to one cohort to the whole portfolio.

Every figure on a funder report descends from a single response written by a single participant. Four layers, so the question — and the citation trail — never gets lost.

Layer 01 · Cell

The single response

One participant, one question or case note or ledger entry. The atomic unit, coded and cited as the source of any later claim.

P-0418 · well-being Q1 · score 3 · "I feel down most days this week."

Layer 02 · Row

The participant view

One participant across all sources and stages — application, baseline, case notes, pulse, reflection, post survey, outcome.

P-0418 · year-2 to year-13 · 47 records across 6 sources.

Layer 03 · Column

The cohort outcome

One outcome across all participants in a cohort — pre→post movement, theme distribution, retention curve.

2024 cohort · n=47 · confidence +2.3 pts · 12 themes · 83% retention.

Layer 04 · Grid

The portfolio view

All cohorts, all programs, all outcomes — for the board, funder, regulator. Every figure descends from a Cell and cites its source.

6 programs · 14 cohorts · 1,500 participants · $48 cost-per-outcome.

From raw artifact to evidence

What Sense does to four kinds of input.

Survey responses, case-note PDFs, audio reflections, and quarterly pulse data arrive in different shapes from different sources. Each one becomes queryable and citable on the participant ID.

Intake survey response

RawQ2 confidence (1–5): 2
Q1 free text: "i dont know it depends sometimes its ok sometimes not"
Shaped on recordparticipant_id: P-0418
confidence_baseline: 2
signal: ambivalence
flag_caseworker: true
cite: srv_8821

Case-note PDF

Raw"…withdrawn, mentioned not sleeping. Read three pages out loud — improved from last term."
Shaped on recordnote_id: cn_4471
themes: sleep_disturbance·new, literacy_progress·+
drift: flag for follow-up
cite: cn_4471 ¶2–3

Audio reflection

Raw"I still got nervous but I had the breathing thing… I put my hand up twice this term."
Shaped on recordreflection_id: ar_0214
test_anxiety: reduced
coping_skill: breathing
baseline 2/5 → 3.5/5
cite: ar_0214 02:14–04:08

Quarterly cohort pulse

Rawn=47 · 39 surveys, 41 case notes, 22 reflections, attendance, QuickBooks export. "Who is at risk?"
Shaped on recordcohort: ys-2024
confidence Δ +0.8 · retention 87%
6 at-risk flags by reason
every_figure_cites_source: true

Frame · the architecture underneath

Three layers redo the work the legacy stack skipped.

Legacy made impact measurement storage — a place where surveys went to die and case notes never reached the dashboard. The reasoning layer runs on top of a structured record, not in place of it.

Layer 01 · Reasoning

Claude — the reasoning layer

Reads the case-note PDF, the audio transcript, the survey response, and the ledger entry as evidence. Extracts themes, flags drift, writes the citation trail. It reasons over the data on every query — it does not store it.

Layer 02 · Primary record

Sopact Sense — one record per person

One stable participant ID across application, case note, survey, reflection, story, and ledger entry. Pre and post questions sit on the same row as the case notes and the cost data.

Layer 02A / 02B · Operational

Finance & frameworks

QuickBooks, Xero, Sage Intacct supply cost-per-outcome at the moment of use. Your theory of change, logic model, IRIS+, or the funder's question set each bind to the same connected record — no re-keying.

Program officer query, Tuesday 14:32 — "Show 6-month outcomes for the 2024 youth-services cohort, broken out by gender, with citations to source records."

Step 01 · Identity

Resolves the cohort to 47 participant IDs — the same IDs used in the case-management, survey, and accounting systems.

Step 02 · Join

6-month surveys (41), case notes (41), reflections (22), attendance, cost joined on the ID. Cut: 23 F, 22 M, 2 non-binary.

Step 03 · Framework

Confidence, literacy, well-being map to your outcome questions — whatever the funder report asks for, no manual rekey.

Step 04 · Cited

Confidence +0.8, literacy +0.4, retention 87%, six at-risk flags — every figure one click from its source record.

Who this is for

The fit shows up where spreadsheets stop and a Salesforce architect is out of budget.

Mid-tier nonprofits, foundations, and impact investors who have outgrown survey-tool-plus-shared-drive but cannot afford a six-month data architecture engagement before producing a report.

School-based & youth services
Multi-region nonprofit with caseworkers in schools, longitudinal tracking from primary through secondary, mixed surveys + case notes + audio, funders asking about outcomes not attendance.
500–5,000 participants · 10–80 caseworkers · 3–20 sites
★ Excellent
Workforce & training
Cohort-based program with pre-mid-post structure, employment or wage outcomes at 6 and 12 months, structured surveys alongside narrative interviews, custom or funder-specific reporting.
50–2,000 per cohort · 4–20 cohorts/year
★ Excellent
Foundation grantee outcomes
Foundation tracking outcomes across a portfolio of grantee partners, mixing self-reported survey responses with site-visit notes and program documents, producing portfolio roll-ups and board reports.
5–500 grantees · $5M–$500M grantmaking
★ Excellent
Impact investor & CSR
Portfolio of investees or community programs reporting against a custom theory of change or portfolio-wide framework. The same connected record per investee feeds quarterly monitoring and annual reporting.
10–200 investees · quarterly + annual
Strong
M&E consultant placing the tool
Consultancy serving multiple mid-tier nonprofits, looking for a tool the client can run after implementation — without a six-month engagement before the first report. White-label at the practice level.
3–30 client orgs · multi-region practice
Strong
Below 50 participants
For organizations under 50 participants total, a survey tool plus a shared drive is often still the right answer. One connected record earns its keep where the qual + quant join is unmanageable by hand.
< 50 participants
Not yet

The framework underneath

Actionable Impact Measurement — the full framework, rewritten for the AI-native era.

Six layers, four data dictionary types, the Cell → Row → Column → Grid roll-up, and how to design a longitudinal program around evidence, not effort.

30,000+

Practitioners using the framework

Melbourne

Co-developed with Melbourne Business School

Read the engine pillar →

Frequently asked

Impact measurement, plainly.

01 What is impact measurement?

Impact measurement is the practice of determining whether a program moved the people it serves on the outcomes it promised. It joins numbers — survey scores, attendance, cost — with stories — case notes, transcripts, reflections — on one participant ID, so a question like "did this cohort improve, or did we only track who showed up?" can be answered with citations to source records rather than a slide.

02 What is the difference between impact measurement and impact management?

Measurement asks what changed; management uses that evidence to act. Impact measurement captures whether outcomes moved. Impact measurement and management (IMM) adds the decisions that follow — redesigning the program, reallocating funding, reporting to boards and funders. Both run on the same connected record, because you cannot manage what you have not measured to a source.

03 What does IMM stand for?

IMM stands for Impact Measurement and Management. It is the discipline of measuring the social or environmental change a program or investment creates, then using that evidence to manage toward more of it. The term is most common in impact investing and philanthropy, where the same participant or investee record carries both the measurement and the decisions made from it.

04 What is the difference between an output and an outcome?

An output is what the program did; an outcome is what changed for the participant. Sessions delivered, students served, and attendance logged are outputs. Confidence improved, employment found, and housing sustained are outcomes. A decade of tooling reported outputs as if they were outcomes — which is why a board can hear "1,500 served" and still not know whether anyone improved.

05 How do you measure the impact of a program?

You measure impact by capturing a baseline, tracking the same participants over time, and comparing the change against what the program promised. In practice that means a stable participant ID at intake, before-and-after scores on the outcomes that matter, the qualitative context that explains movement, and a roll-up where every cohort figure cites the source response, case note, or ledger entry behind it.

06 What is an impact measurement framework?

An impact measurement framework is the named structure that connects what a program does to what changes for the people it serves. Common frameworks include Theory of Change, Logic Model, IRIS+, the Five Dimensions of Impact, and SROI. The framework is only useful when it binds to real data — most organizations have one on paper but never connect it to the records it is supposed to describe.

07 Who pays for impact measurement, and why is it so hard?

The funder asks for it, the grantee or investee carries the cost, and almost no one funds the capacity to build it. That structural push and pull is why measurement so often collapses into a report written once a year to satisfy the ask, rather than evidence the program can learn from. The way out is to stop funding measurement as a separate activity and let it ride the workflow already running.

08 Can impact measurement be part of an existing workflow instead of a separate project?

Yes — and that is the most durable way to do it. An accelerator captures pre, mid, and post context through its application and follow-up; a training program adds mentor feedback and LMS data; a grant manager binds grantee metrics and semi-annual reports to one record; an impact investor builds a portfolio data dictionary from quarterly monitoring. Each workflow has its own stages, and the measurement falls out of the work the team already does.

09 What software is used for impact measurement?

Impact measurement software collects program data, scores it against a framework, and rolls outcomes up across cohorts, sites, or grantees. Platforms that foundations and nonprofits commonly evaluate include Sopact Sense, UpMetrics, Bonterra Impact Management, Amp Impact, SureImpact, and ActivityInfo. For a criteria-by-criteria comparison, see impact measurement software compared.

A different starting point

Bring a real cohort. Leave with the citation trail behind every number.

Sixty minutes with Unmesh Sheth. No deck. We work a question you already need answered — a cohort review, a funder report, a Tuesday question from your board — against your real data shape. You leave with a path that does not require rebuilding the data each cycle.