play icon for videos

Nonprofit Impact Measurement: Methods, Frameworks, Examples

How nonprofits measure impact without revenue. Six methods, a five-step results chain, and a worked example from a literacy program.

US
Pioneering the best AI-native application & portfolio intelligence platform
Updated
May 4, 2026
360 feedback training evaluation
Use Case

A guide for program teams and funders

A nonprofit's books show donations received.

A nonprofit's impact shows what changed for the people served.

Reporting the first does not prove the second.

This page covers nonprofit impact measurement in plain language: the five-step results chain, six methods funders ask about, the design choices that decide whether your data answers what they want to know, and a worked example from a community-foundation-funded literacy program. No prior background needed.

What this page covers

  1. 01 The five-step results chain
  2. 02 Definitions and methods
  3. 03 Six design principles
  4. 04 The methods choice matrix
  5. 05 A literacy program walkthrough
  6. 06 Funder-facing FAQ

The results chain

From dollars in to lives changed: the five-step chain

Every nonprofit program runs the same five steps under the surface, whether the program is literacy tutoring, workforce training, or food distribution. The chain is the spine of every impact measurement framework. The question is which step you collect data on, and which step the funder is actually asking about.

Causal pathway
01 Inputs What goes in Funding, staff, materials, partnerships, the program design itself.
02 Activities What the program does Tutoring sessions, workshops, case management, services delivered.
03 Outputs What got delivered People served, sessions held, materials distributed. Countable activity.
04 Outcomes What changed for participants Skill gain, behavior change, status improvement during or shortly after.
05 Impact What lasts Sustained change in lives, communities, or systems that the program plausibly caused.
Evidence the funder can read at each step

Budget reports, staffing rosters, partnership agreements.

Attendance logs, session records, case notes.

Enrollment counts, sessions delivered, completions. Most grant reports stop here.

Pre-post scores, behavior change, status improvement. Most funder questions land here.

Six and twelve month follow-up, comparison cohort, system change.

The gap between output and outcome data is where nonprofit impact measurement lives. Outputs report what the program did. Outcomes report whether it worked. Funders increasingly want both, in one document, attributable to the same participants over time.

Caption. The chain is sometimes called a logic model when drawn as a planning grid, and a theory of change when drawn with named assumptions between steps. The architecture underneath is the same. Sources: Kellogg Foundation Logic Model Development Guide, ActKnowledge Theory of Change, Impact Management Project Five Dimensions.

Definitions

Nonprofit impact measurement, defined

The terms in this section show up in every grant report, foundation RFP, and board deck. They mean different things and the differences matter. Each answer reflects how the field uses the term today, not how it was defined two decades ago.

What is nonprofit impact measurement?

Nonprofit impact measurement is the practice of evidencing program-level change in participant lives, separate from financial reporting. Where a for-profit proves value through revenue, a nonprofit proves value through outcomes: skill gains, behavior changes, status improvements that the program plausibly caused.

The discipline covers what to measure (which participant outcome should move if the program works), how to collect it (baseline at entry, follow-up at exit and later), and how to report it back to funders, boards, and the people the program serves. The data has to answer the question the funder is actually asking, which is rarely how many people walked through the door.

How do nonprofits measure impact without revenue?

Revenue is irrelevant to nonprofit impact because the work is funded by donations and grants, not by what participants pay. There is no transaction to count. So the proof of value comes from somewhere else: outcomes the program claims to produce.

Literacy programs report reading-level gains. Workforce training programs report employment rates at six and twelve months. Homelessness services report housing stability. Each program names a participant outcome that should move if the program works, collects a baseline measure at entry, collects a follow-up at exit, and compares the same people. The metric travels with the program, not with the funding model.

What are the most common nonprofit impact measurement methods?

Six methods cover most nonprofit work. Logic Model maps inputs, activities, outputs, and outcomes in a four-column grid. Theory of Change adds a causal pathway with named assumptions between steps. Pre and Post Surveys compare the same people at program start and end. SROI (Social Return on Investment) converts outcomes to a monetary ratio against program cost. IRIS+ provides a standardized indicator library from the Global Impact Investing Network. The Five Dimensions of Impact framework asks five structured questions about every program: what, who, how much, contribution, and risk.

Method choice depends on funding scale and the question the data has to answer. A foundation grant of $50,000 does not justify the same instrumentation as a federal grant of $5 million. The methods matrix later on this page maps the choices.

What is the difference between outputs, outcomes, and impact?

Outputs are what the program delivered, counted as activity: people served, sessions held, materials distributed. They answer "what did you do." Outcomes are changes in participants during or shortly after the program: skill gain, behavior change, status improvement. They answer "what changed." Impact is the longer-term consequence the program plausibly caused: sustained change, system-level shift, ripple effect. It answers "did the change last and was the program responsible."

Most grant reports stop at outputs because outputs are the easiest to count. Most funder questions land at outcomes because outcomes are what the grant was actually for. The distance between those two is the operating reality of nonprofit impact measurement.

Impact meaning: what does "impact" actually mean in nonprofit reporting?

In nonprofit reporting, impact has a specific meaning that overlaps with but differs from everyday usage. The everyday meaning of impact is "any noticeable effect." The technical meaning in nonprofit measurement is narrower: a sustained change in participant lives or social conditions that the program plausibly caused.

Three things have to be true for a result to count as impact. The change has to be measurable. The change has to last past the program's end. And the program has to be a plausible cause, not coincidence. The Impact Management Project's Five Dimensions framework formalizes these as: what changed, for whom, how much, the program's contribution, and the risk that it did not actually happen.

Adjacent terms that get confused

Output vs Outcome

An output is what the program delivered (200 sessions). An outcome is what changed for the participant (reading level rose by one grade). Outputs are the program's verb. Outcomes are the participant's noun.

Outcome vs Impact

An outcome is the change measured at or near program end. Impact is the change that lasts and that the program plausibly caused. Outcomes happen during. Impact often shows up later.

Impact measurement vs Program evaluation

Impact measurement is the ongoing practice; program evaluation is the periodic study. You measure impact every cycle. You evaluate the program every few years. Evaluation answers harder causal questions; measurement keeps the lights on for funders.

Logic Model vs Theory of Change

A logic model is a planning grid showing inputs, activities, outputs, and outcomes. A theory of change is a causal pathway with named assumptions. The logic model is for funders. The theory of change is for learning.

Six design principles

Principles that decide whether the data answers the funder's question

Six choices made before the first survey goes out. Each one alone is a small decision. Together they decide whether the program's impact measurement is reportable to a foundation, a federal funder, or a board with rising scrutiny on outcomes versus activity.

01 · Definition

Define impact before you measure it

An explicit causal claim, not a slogan.

Write one sentence: "If the program works, [participant outcome] will move from [baseline] to [target] for [population] within [time frame]." Every measurement decision flows from this sentence.


Why it matters. Programs without a written causal claim default to counting activity. Activity is cheap to count and silent on whether the program worked.

02 · Method match

Match the method to the funding question

A $5M federal grant earns instrumentation a $50K foundation grant does not.

Pre-post surveys cover most foundation-scale grants. Quasi-experimental designs cover state and federal grants. Randomized controlled trials are reserved for research-grade evidence and large multi-site studies. Pick the lightest method that answers the funder's question credibly.


Why it matters. Over-instrumented programs spend the grant on data collection. Under-instrumented programs cannot answer the question the grant was for.

03 · Before and after

Outcomes need a baseline, not a snapshot

One cross-section cannot show change.

Collect the same measure at program entry, at exit, and at six or twelve months later. The baseline is not optional; without it, the exit score is not a change measure, it is a level measure. Levels do not show whether the program moved the needle.


Why it matters. A program that admits already-strong participants will report strong exit scores even if it changed nothing. Baseline data is what protects against this.

04 · Identity

A persistent participant ID makes longitudinal possible

Match participants by ID at intake, not by email at follow-up.

Each participant gets one ID at first contact and carries it across every survey, case note, and follow-up. Names change, emails change, phone numbers change. The ID does not. Without it, longitudinal reporting either does not happen or happens by hand.


Why it matters. A program that wants to report twelve-month outcomes but matched participants by name and email at follow-up will lose 30 to 50 percent of records to mismatches.

05 · Attribution

Credit what the program plausibly caused

Contribution claims, not attribution overreach.

A program rarely causes an outcome alone. Other services, life events, and individual effort all contribute. Strong reports name the program's contribution to the outcome and acknowledge what else is in the picture, rather than claiming the full effect.


Why it matters. Funders increasingly read overclaims as a credibility signal in reverse. A measured contribution claim travels further than an inflated attribution claim.

06 · Time horizon

Outcomes happen during, impact often shows up later

Do not claim impact before it is plausible.

Skills learned in a workforce program can be measured at exit. Sustained employment can only be measured at six or twelve months. Programs that report twelve-month employment at the end of a three-month cohort are reporting something else: a forecast, not a measurement.


Why it matters. Reports that conflate exit outcomes with longer-term impact erode trust with funders who track participants on their own. Honesty about time horizon is durable.

The methods choice matrix

Six choices that decide whether your data answers the question

Each row is a decision the program team makes before any data collection starts. The broken column describes the workflow most teams fall into, not a strawman. The working column describes the alternative the strongest programs follow. Read the rows top to bottom, in order; the choices compound.

The choice

Broken way

Working way

What this decides

How to define program success

The first decision; it controls every other one.

Broken

Count participants served. Report the number to the funder. Ship the report when the cycle ends.

Working

Name the participant outcome that has to move if the program works. Write it down before recruiting starts. Build the data system around that outcome.

Decides whether the rest of the system collects activity data or change data. Activity is silent on whether the program worked.

When to collect post data

The follow-up window decides what counts as outcome vs impact.

Broken

End-of-program survey only. Participants leave. The records sit untouched. Nobody hears from them again.

Working

End-of-program survey, plus a six-month follow-up, plus a twelve-month follow-up where the program horizon supports it. The participant ID makes the contact possible.

Decides whether you can report sustained change versus immediate change. Funders increasingly want both numbers.

How to track participants over time

The single most expensive decision when it goes wrong.

Broken

Match by name and email at follow-up. Sarah Johnson became S. Johnson. Her email changed. The match fails. Someone reconciles two hundred records by hand.

Working

Persistent participant ID assigned at first contact. Same ID on every survey, case note, and follow-up. Records match cleanly across waves regardless of what changed about the person.

Decides whether longitudinal reporting is even possible. Without persistent ID, the data exists but cannot be assembled.

Which metrics to report to funders

Outputs are cheap to count. Outcomes are what the grant was for.

Broken

Outputs only: people served, sessions delivered, completion rate. Sometimes a quote from a participant, pulled from a comment box.

Working

Outputs plus at least one outcome metric per named goal: skill gain, behavior change, status improvement. Segmented by participant subgroup so equity gaps stay visible.

Decides whether the funder sees what the grant was actually for. Output reports read as activity logs, not impact reports.

How to handle qualitative responses

Where the strongest evidence and the strongest selection bias both live.

Broken

Pull three or four quotes for the funder report. The strongest stories get told. Dissenting responses disappear into the spreadsheet nobody opens.

Working

Code every open-ended response by theme. Report frequencies and exemplar quotes side by side. Flag the responses that contradict the headline number for closer reading.

Decides whether qualitative data complements or contradicts the numbers. Coded text is evidence; pulled quotes are marketing.

How to compare participant subgroups

Aggregation hides the gaps a funder is increasingly trying to surface.

Broken

One average across all participants. Report a single percentage. Subgroup variation gets averaged into the headline number.

Working

Segment by entry baseline, demographic group, and program track. Report change for each segment. Note where the gains were uneven and what the program plans to learn from it.

Decides whether equity gaps are visible or hidden. Funders increasingly fund programs that surface them honestly.

The compounding effect

The first row controls every other row. Define success as outputs and the rest of the system collects activity data. Define it as outcomes and the rest of the system has to collect change data, requires persistent IDs, requires follow-up windows, and surfaces subgroup variation. The decisions are not independent; they compound.

A worked example

An after-school literacy program reports to a community foundation

Three sites. Two hundred and forty K-3 students enrolled across a nine-month cohort. The grant was made on the basis of a literacy outcome, not a session count. This walks through what the program actually did with its data and what the funder report eventually showed.

"We had attendance numbers from day one. Three thousand eight hundred sessions delivered, two hundred and forty kids enrolled, completion rate above eighty percent. Clean numbers, every funder loved them. The problem was the foundation kept asking whether reading scores actually moved. We had pre-tests in a Google Form, post-tests in a different Google Form, parent narratives in a Word document, and teacher notes in a shared drive. Nothing connected back to the same student."

After-school literacy program lead, end of cohort one, before redesigning the data system

The two axes the report had to bind

Quantitative axis

DIBELS reading scores at entry, mid, and exit

A standardized literacy assessment scored at three points in the program cycle. Each score belongs to one student, identified by a persistent participant ID assigned at first contact. The same ID carries through to the six-month follow-up.

Bound at collection, not at report time

Qualitative axis

Parent and teacher narratives about reading at home and engagement in class

Open-ended responses from parents and teachers each cycle. Coded by theme: home reading frequency, library use, classroom participation, parent confidence. Linked to the same participant ID so quotes attach to scores.

What Sopact Sense produced

Matched longitudinal records

Each student's pre, mid, post, and six-month scores under one ID. The funder report opened with a class-of-2026 chart showing change for each cohort segment.

Reading-level gain by entry quartile

Students entering below grade level showed larger gains than already-proficient peers. The report named both numbers; the foundation funded a follow-on cohort because of the equity surfacing.

Qualitative themes linked to score gains

Parents reporting "reads aloud at home now" appeared disproportionately in the largest-gain quartile. The exemplar quotes attached to actual student records, not a generic anonymous story.

A funder-ready dashboard

One link, refreshed in real time, replaced the quarterly spreadsheet email. The foundation program officer built her board memo from the same source.

Why the previous toolset fell short

CSV exports across two systems

Pre-test in one Google Form, post-test in another. Reconciliation by hand. The two hundred and forty rows turned into one hundred and seventy-three matched pairs after deduplication; the rest were lost to typos.

Qualitative trapped in document files

Parent narratives lived in a shared drive, teacher notes in another. Nobody coded them. The funder report carried two pulled quotes selected for emotional impact, not for representativeness.

No link between teacher narratives and student gains

The qualitative axis and the quantitative axis lived in different files. The report could not say which themes co-occurred with the largest reading gains because the data was never joined.

Manual reconciliation as a recurring tax

Each reporting cycle started with a week of someone matching records by hand. The cost was paid every cycle. By cohort three, the program officer had stopped asking for the qualitative side; the cost of producing it had become its own deterrent.

The integration is structural in Sopact Sense, not procedural. The participant ID issued at first contact is the same ID on the post-test, on the parent narrative, on the teacher note, and on the six-month follow-up. The literacy program did not assemble the report by joining files. The report assembled itself because the records were always already linked. The program lead spent the saved week reading the qualitative findings instead of reconciling them.

Where this lands

Three nonprofit shapes, three different measurement architectures

Nonprofit impact measurement looks the same on a slide deck and different in operation. The architecture that works for a workforce program with an eight-month cohort breaks for a foundation aggregating outcomes across forty grantees. Three program shapes, what each one collects, and where each one breaks.

01 · Workforce

Job training nonprofit

Long cycles. Employment-rate outcomes. Employer touchpoints.

A workforce training nonprofit runs three to six month cohorts. The participant outcome is employment, not skill score; the funder is asking whether graduates got and kept jobs. The measurement architecture has to span participant entry, mid-program assessment, exit, three-month employment check-in, six-month employment check-in, and employer feedback once the participant is placed.

What breaks: keeping touch with participants after exit. Phone numbers change, employer relationships are uneven, response rates drop with each follow-up wave. Programs that reported eighty percent employment at exit and silently lost track of forty percent of participants by month six were reporting a number that did not survive scrutiny.

What works: persistent participant ID at intake, employment status as the named outcome, and three follow-up waves built into the data plan from the start. The program's own definition of success is sustained employment at six months, not exit completion.

A specific shape

The strongest workforce reports show completion rate, three-month employment rate, six-month employment rate, and twelve-month wage growth for the same matched cohort. Drop-off between the columns is the report's most useful number, not the largest column.

02 · Community foundation

Grantee portfolio aggregation

Forty grantees. Common indicators. Aggregate-level reporting.

A community foundation funding forty youth-services grantees has to show its board what the portfolio produced, not what each grant produced. The architecture rolls up forty individual grantee reports into a portfolio narrative. Common indicators across grantees become the foundation's outcome metrics; grantee-specific metrics live in the underlying reports.

What breaks: each grantee using its own definition of "youth served," its own intake instrument, its own outcome metric. The foundation receives forty PDFs in different formats and has to extract a common number that does not exist cleanly in any of them. Aggregation by hand becomes a quarterly tax on the program officer.

What works: a shared indicator set at the portfolio level, optional grantee-specific extensions, and a common data structure that lets the foundation roll up numbers without re-extracting from each PDF. Grantees keep their autonomy on local metrics; the foundation gets a portfolio number it can defend to the board.

A specific shape

Strong foundation portfolio reports show three to five common indicators across all grantees, a portfolio-level outcome trajectory, and grantee-specific case studies for context. The board memo opens with the portfolio number, not with the forty grantee logos.

03 · Direct service

Human services nonprofit

Continuous enrollment. No fixed cohort. Service intensity varies.

A direct-service nonprofit (housing services, food security, behavioral health) does not run cohorts. Participants enter on different days, receive services for different lengths of time, and exit on their own schedules. The cohort framing that works for cohorted programs collapses here; there is no shared start date to anchor "before" and "after."

What breaks: trying to force a cohort frame onto continuous enrollment. The annual report ends up averaging participants who got two services with participants who got fifty, masking what intensity-of-service actually does for outcomes. Reports read as flat averages that hide the program's working theory.

What works: each participant carries their own pre-post measurement window from their personal entry date, segmented by service intensity, with the report aggregating change-from-baseline rather than fixed-date snapshots. The participant ID makes this possible; without it, intensity segmentation cannot be calculated.

A specific shape

Strong direct-service reports show change-from-baseline by service intensity tier (light touch, moderate, intensive) rather than a single average across all participants. The intensity gradient is usually the report's most defensible finding.

A note on tooling

The collection tools listed above all do collection well. SurveyMonkey and Google Forms accept responses, KoboToolbox handles offline collection in field settings, Salesforce NPSP and Apricot store relationships and case data. The architectural gap most nonprofits hit is not collection; it is what happens after. The pre-test sits in one tool, the post-test in another, the qualitative narrative in a third, and matching the same participant across all three is a manual reconciliation cost that scales linearly with cohort size.

Sopact Sense addresses that gap by issuing a persistent participant ID at first contact and carrying it through every subsequent survey, follow-up, and qualitative response, then linking quantitative and qualitative data on the same record automatically. The result is that longitudinal reporting and qualitative coding stop being separate manual workflows. For a side-by-side comparison of impact measurement platforms (Sopact Sense, UpMetrics, Impact Cloud, and others) along with their tradeoffs, see the dedicated software comparison page at use-case/impact-measurement-software.

FAQ

Nonprofit impact measurement questions, answered

Q.01

What is nonprofit impact measurement?

Nonprofit impact measurement is the practice of evidencing program-level change in participant lives, separate from financial reporting. Where a for-profit proves value through revenue, a nonprofit proves value through outcomes: skill gains, behavior changes, status improvements that the program plausibly caused. The discipline covers what to measure, how to collect it, and how to report it back to funders, boards, and the people the program serves.

Q.02

How do nonprofits measure impact?

Nonprofits measure impact by defining a participant outcome that should move if the program works, collecting baseline data at entry, collecting follow-up data at exit and often at six or twelve months later, and comparing the same people across waves. Methods include logic models, theory of change, pre and post surveys, SROI, and IRIS+ indicators. The choice depends on funding scale and the question the data has to answer.

Q.03

How do nonprofits measure impact without revenue?

Revenue is irrelevant to nonprofit impact because the work is funded by donations and grants, not by what participants pay. Instead, nonprofits report outcomes that their programs claim to produce: literacy gains for an after-school program, employment for a workforce training program, housing stability for a homelessness program. The metric travels with the program, not with the funding model.

Q.04

What are the most common nonprofit impact measurement methods?

Six methods cover most nonprofit work. Logic Model maps inputs, activities, outputs, and outcomes in a four-column grid. Theory of Change adds a causal pathway with named assumptions. Pre and post surveys compare the same people across two points. SROI converts outcomes to a monetary ratio. IRIS+ provides a standard indicator library. The Five Dimensions of Impact framework asks five structured questions about every program. Pick by funding scale and the question the data has to answer.

Q.05

What is the difference between outputs, outcomes, and impact?

Outputs are what the program delivered: people served, sessions held, materials distributed. They answer "what did you do." Outcomes are what changed for participants during or shortly after the program: skill gain, behavior change, status improvement. They answer "what changed." Impact is the longer-term consequence the program plausibly caused: sustained change, system-level shift, ripple effect. It answers "did the change last and was the program responsible." Most grant reports stop at outputs. Most funder questions land at outcomes.

Q.06

How do nonprofits measure program impact for grant reporting?

Grant reports usually require three things: a count of people served, a description of what the program did, and evidence that something changed for participants. The first two are output reporting and are straightforward. The third is outcome reporting and is where most reports thin out. Strong reports collect a baseline measure at participant entry, a follow-up at exit, and a sustained-change measure six or twelve months later, then segment by participant subgroup so equity gaps stay visible.

Q.07

What is a logic model in nonprofit impact measurement?

A logic model is a four-column grid: Inputs, Activities, Outputs, Outcomes. Each column lists the program elements at that stage. Inputs name resources used; activities name what the program does; outputs name countable deliverables; outcomes name participant changes. Some logic models add a fifth Impact column. The grid is a planning tool first and a reporting tool second; it does not by itself test whether the program works.

Q.08

How does theory of change differ from a logic model?

A logic model is a planning grid; a theory of change is a causal pathway. The logic model lists program elements column by column. The theory of change names the steps from problem to impact and the assumptions that have to be true for each step to lead to the next. The theory of change is the learning tool; it tells you what the data has to test. Most teams use both: a logic model for funder reports and a theory of change to guide what gets measured.

Q.09

What metrics should nonprofits report to funders?

Report at least one outcome metric for each named program goal, segmented by participant subgroup. Outputs without outcomes read as activity reports, not impact reports. If the program promises literacy gains, report a literacy measure. If it promises employment, report an employment rate at exit and at six months. Report change against the participant's own baseline where possible, not against a population average. Sourced numbers, not round percentages.

Q.10

How long should nonprofits track participants after a program ends?

The minimum that matters is six months post-exit; twelve months is stronger; twenty-four months is research-grade. The principle is that outcomes happen during the program, but impact often shows up later. Workforce training shows job retention at twelve months, not at completion. Literacy programs show grade-level proficiency at the next assessment cycle, not the day class ends. The horizon should match what the program plausibly causes.

Q.11

What is the best way to handle qualitative survey data?

Code every response, do not pull a few quotes for the report. Coding means assigning each response to one or more themes, counting frequencies, and reporting the patterns alongside exemplar quotes. Pulled quotes alone create selection bias; the strongest stories get told and the dissenting ones disappear. Coded qualitative data complements the quantitative numbers and often explains why an outcome moved or did not move for a particular subgroup.

Q.12

How do small nonprofits measure impact without a research team?

Pick one outcome that has to move if the program works. Collect baseline at entry and follow-up at exit using identical questions. Use a persistent participant ID so the same person can be matched across waves. Report change against each participant's own baseline. Add a six-month follow-up if the program promises sustained change. Most small nonprofits do not need an RCT; they need a clean before-after comparison and a way to track participants over time.

Q.13

Can I use Google Forms or SurveyMonkey for nonprofit impact measurement?

Yes for collection. The forms accept responses, the data exports as a spreadsheet. The gap is what comes after collection. Neither tool issues a persistent participant ID, neither links a baseline survey to a follow-up survey for the same person, and neither codes qualitative responses. So the survey runs but the longitudinal report does not assemble itself; someone matches records by hand. For one-cycle programs that is acceptable; for repeated waves, the matching cost dominates.

Q.14

What software do nonprofits use to measure impact?

Common categories include survey collection tools (SurveyMonkey, Google Forms, KoboToolbox), CRM and case management (Salesforce NPSP, Apricot), and dedicated impact measurement platforms (Sopact Sense, UpMetrics, Impact Cloud). Categories address different parts of the workflow. Survey tools collect, CRMs store relationships, impact platforms link the two and produce outcome reports. The software comparison page covers the tradeoffs at /use-case/impact-measurement-software.

Working session

Bring your impact framework. See what changes.

A 60-minute working session. We open your current logic model or theory of change, walk it against the five-step results chain, and mark where outputs and outcomes are getting confused. You leave with a tested method, not a tool demo.

Format 60 minutes, video call. One program lead and one data person works best.
What to bring A current logic model, results framework, or grant-report template. Rough is fine.
What you leave with A marked-up chain showing where your data ends and where outcome evidence would need to start.