Social impact metrics: outputs, outcomes, indicators
A plain-language guide to social impact metrics: the difference between outputs and outcomes, six properties of a working metric, and a worked example.
An output counts what was delivered.An outcome measures what changed.Most impact reports show the wrong one.
This guide explains the difference in plain terms, names the six properties
every working metric has, and shows what a complete metric set looks like
for a program. Worked example comes from a community lending program.
No prior background needed.
THE PATHWAY
Five tiers from resources to societal change.
Every social program runs along the same chain. Resources go in. Activities
happen. Things get delivered. People change. The world shifts a little.
The line between what got delivered and whether anyone changed is where
most reports break down.
Causal pathway, left to right
01
Inputs
Money, staff, materials, time invested in the program.
→
02
Activities
What the program does. Workshops run, loans processed, services provided.
→
03
Outputs
Counts of what got delivered. 200 loans. 50 graduates. 4,000 meals.
→
04
Outcomes
Measurable change in the people served. Income gained. Skill acquired. Business survived.
→
05
Impact
Long-term, broader change in the community or system the program touches.
What each tier answers
How much was invested?
What did the program do?
How much got delivered?
Did participants change?
Did the world change?
The boundary between tier 03 and tier 04 is the boundary between
activity reporting and impact measurement.
Output metrics are easy to count and prove accountability for spending.
Outcome metrics are harder to count and answer whether the spending
bought any change.
Tier names are conventional across IRIS+, the Logic Model, and most
funder reporting frameworks. The labels matter less than the discipline
of naming which tier each metric measures, so a published report does
not mix categories.
DEFINITIONS
Five terms, defined the way the data has to support them.
The vocabulary around social impact metrics overlaps. Indicators, KPIs,
scores, measures: most teams use them interchangeably. The difference
shows up later, when a report has to be written and the words have not
been defined the same way across the team. These five definitions are
the ones the rest of this page uses.
What are social impact metrics?
A social impact metric is a specific, repeatable measure of a change
a program is meant to produce. A working metric names four things:
what is being counted, who is being counted, when it is measured,
and what counts as a meaningful difference.
Metrics that skip any of those four become noise. The most common
failure is reporting outputs (what the program delivered) and calling
them outcomes (what changed for the people served). The two are not
interchangeable, and a funder who knows the difference will spot the
mix in the first paragraph.
What is the difference between an impact metric and an impact indicator?
An indicator is a signal. A metric is a definition. The indicator is
the data point you collect (a survey score, a job-placement count, a
revenue number). The metric is the rule that says how to collect it,
from whom, and when, so the data point is comparable across cohorts.
In daily practice the words are used interchangeably. The discipline
is the same either way: write the metric definition first, then collect
the indicator. A program that defines indicators in the report-writing
phase rather than the design phase ends up with data that does not fit
the question.
How to measure social impact?
Measuring social impact means asking the same people the same
questions before the program, after the program, and again later,
then comparing the answers. Pair quantitative items (rating scales,
counts, dates) with two or three open-ended prompts that capture
what changed and why.
Use a persistent ID so a participant's pre, post, and follow-up
answers can be linked. Compare the change inside the program to a
comparison group when feasible. Without those four pieces (same
people, same questions, linked records, comparison) what you have
is anecdote, not measurement.
What is a social impact KPI?
A social impact KPI is a small set of outcome metrics chosen because
they signal whether the program is on course. Three to seven KPIs is
the working range. More than that and no one watches them.
A good KPI set includes at least one metric per stage of the program
theory: an early signal (engagement or completion), a primary outcome
(the change the program is meant to produce), and a downstream outcome
(whether the change persisted at six or twelve months). The KPI list
is the report cover. The full metric set is the body of the report.
What is a social impact score?
A social impact score is a single composite number rolling several
metrics together. Scores are useful for cross-portfolio comparison
and for external communication where one number is easier to carry
than seven. They are not useful for program improvement, because the
score hides which input metric moved and which did not.
Most working programs report the score AND the underlying metric set.
The score gives a headline. The metric set gives the explanation. A
score without its metric set is a marketing number.
RELATED, BUT NOT THE SAME
Four neighbors of "metric" that get confused for it.
Output
A count of what the program delivered. Workshops held, loans
disbursed, meals served. Measures effort, not effect.
Outcome
A measurable change in the people the program is meant to serve.
Income gained, skill acquired, business survived.
Indicator
The data point itself. A score, a percentage, a yes-or-no answer.
Indicators sit inside metrics.
Impact
Long-term, broader change at the community or system level.
Outcomes are about participants. Impact is about the world.
PROPERTIES OF A WORKING METRIC
Six properties every metric needs to do its job.
Most metrics fail one or two of these and still get reported. The result
reads as evidence and behaves as noise. The six properties below are the
filter every metric on the dashboard should pass before it goes in front
of a board, a funder, or a program team trying to decide what to change.
01 · LEVEL
Output or outcome, named.
A metric is one or the other. Never both.
Workshops held is an output. Skills retained six months later is an
outcome. Mixing them in the same column produces a report that
looks comprehensive and answers no question.
Why it matters:Funders read for outcomes. Output-heavy reports get filed; outcome-led reports get funded.
02 · UNIT
Numerator and denominator.
Every count needs a context.
"Served 500 participants" answers nothing. "Served 500 of 800 eligible
households in the catchment area, 62 percent" answers reach. The
denominator is what makes the numerator legible across years and cohorts.
Why it matters:
A raw count grows with budget. A ratio shows whether reach changed.
03 · CHANGE
Direction and magnitude.
Did it improve, and by how much?
"Improved confidence" tells you nothing. "Confidence rose from 5.4
to 7.2 on a 10-point scale, on average, six months in" tells you the
direction (up), the size (1.8 points), and that someone has a baseline.
Why it matters:
Magnitude separates a real shift from survey-week mood.
04 · LINKAGE
Same people over time.
Cross-sectional comparisons hide change.
Comparing the average of cohort A's pre survey to cohort B's post
survey tells you about two different groups, not about one group
changing. A persistent participant ID is the only structure that
links a person's pre to their own post.
Why it matters:
Without linkage, the report is two snapshots, not measurement.
05 · EXPLANATION
Numbers paired with words.
Quantitative anchors. Qualitative reasons.
A metric that moved is more useful when paired with two or three
short open-ended responses from the same participant explaining what
happened. The number tells you what changed. The words tell you why,
and what the program did that worked.
Why it matters:
Open-ended responses surface which program element drove the change.
06 · CAUSE
Comparison or counterfactual.
Did the program cause it, or was it happening anyway?
Pre-post change inside a program tells you something happened. A
comparison group, a regional benchmark, or a waitlisted cohort tells
you whether the program caused it. The honest version of impact
measurement always carries some attempt at counterfactual.
Why it matters:
Without a comparison, "the economy improved" is a competing explanation.
METRIC CHOICE MATRIX
Seven decisions that decide whether the metric works.
Most teams design a metric set in a single afternoon and live with the
consequences for years. Each row of this social impact matrix is one
decision the team is making, knowingly or not. The broken-way column
is the workflow most teams fall into. The working-way column is what
the page argues for.
The choice
The broken way
The working way
What this decides
Choosing what to count
Output or outcome
BROKEN
Counting what is easy to count: workshops held, participants
enrolled, services delivered. Calling those numbers the impact.
WORKING
Counting what changed for the people served. The output stays in
the report as context, not as the headline.
Whether the report measures effort or effect. Funders read
the headline first.
Tracking participants
Anonymous or linked
BROKEN
One-shot survey at the end. Or two surveys collected anonymously,
with no way to link a person's pre to their own post.
WORKING
Persistent participant ID assigned at first contact. Pre, post, and
follow-up all link to the same record automatically.
Whether the team can attribute change to the program at the
individual level, not the cohort average.
Setting the metric scale
Vague or bounded
BROKEN
Aspirational language: "improved wellbeing", "stronger community",
"increased confidence". No scale. No threshold. Nothing to compare.
WORKING
Bounded scale named in the metric definition: a 1-to-10 score, a
percent change, a yes-or-no threshold. Measured at named moments.
Whether the metric is repeatable across cohorts and years,
or only describes one report.
Numerator and denominator
Raw or contextualized
BROKEN
Reporting the count alone: "served 500 people". Reach is unknowable.
Year-over-year comparison reads as growth when budget grew.
WORKING
Reporting numerator over denominator: "served 500 of 800 eligible,
62 percent". Reach is a ratio. Cohorts compare cleanly.
Whether the metric scales with program size or simply inflates with budget.
Combining numbers and words
Separated or linked
BROKEN
A quantitative survey on one platform. Open-ended interview notes
in a separate document. The two never line up to the same person.
WORKING
Quantitative scale and open-ended prompt collected in the same
instrument, against the same participant ID. Number plus reason,
stored together.
Whether the report can explain why a metric moved, not
only whether it moved.
Choosing a comparison
Inside-only or external
BROKEN
Pre-to-post change reported as the impact, with no benchmark. "The
economy got better" is a perfectly good rival explanation.
WORKING
Pre-to-post change paired with a comparison: a waitlist cohort, a
regional benchmark, a public dataset. The honest version names
what the comparison is and what it cannot rule out.
Whether the program can credibly claim cause, not mere correlation.
Reporting cadence
Annual or rolling
BROKEN
One annual impact report. Data assembled from disconnected sources
in the six weeks before the deadline. The team cannot correct
course mid-year.
WORKING
Quarterly cohort reviews against the same metric set. Annual rollup
is a summary, not a build. Drift gets caught while it can still be
addressed.
How fast the program can correct course when a metric
drifts the wrong way.
The first decision controls all the others. A team that
chooses outputs over outcomes does not need persistent IDs, does not
need bounded scales, does not need comparison groups. The decision to
measure outcomes is the decision to invest in the rest of the matrix.
A WORKED EXAMPLE
Small-business lending: from output count to outcome metric.
A community development financial institution (CDFI) lending to micro
and small businesses in low-income neighborhoods. The team has reported
"loans disbursed" and "dollars deployed" for years. A new funder is
asking what the loans actually produced.
We have always been able to say how many loans we made and how much we
moved out the door. The new funder wants to know whether the businesses
we lent to are still operating, whether monthly revenue grew, and how
many people they employ now. We have the data, sort of. Some is in our
loan-management system, some is in survey responses we ran twice and
never linked, and some is in the program officer's head. None of it
rolls up.
Lending program director, mid-portfolio review
THE METRIC SET, IN TWO AXES
Quantitative
Numbers, scales, counts
Loan amount and term
Repayment status, months since disbursement
Monthly business revenue at month 0, 6, 12
Employees on payroll at month 0, 6, 12
Business operating status at 12 months (yes / no / pivoted)
linked to one borrower record
Qualitative
Open-ended responses
What did this loan let you do that you could not have done otherwise?
What changed about how you run the business in the last year?
What was the hardest month and what got you through it?
What would you tell the program team to do differently?
What the working setup produces
A linked borrower file
Every survey response, loan record, and program note tied to one
persistent borrower ID. No spreadsheet matching at report time.
Pre-and-post comparable revenue
Monthly revenue collected at month 0, 6, and 12 with the same
question wording. The change is a real change, not a reframed
question.
Reasons attached to numbers
When a borrower's revenue jumps or drops, the open-ended response
from the same survey explains why, in the borrower's own words.
A reportable outcome metric
"Sixty-five percent of borrowers grew monthly revenue by twenty
percent or more, twelve months after disbursement." Funder-grade.
Why traditional tools fail here
Two unlinked surveys
Pre survey is one Google Form. Post survey is another. Names
shorten, emails change, and matching is hand work that breaks at scale.
Quant and qual stored apart
Numbers in one spreadsheet, interview notes in a Word doc. The
"why" never lines up to the "what" without manual cross-walking.
No business operating status
When a borrower stops responding, the team cannot tell if the
business closed, pivoted, or simply went silent. Survival rate
is unknowable.
Output-only reporting
The published report says "deployed $4.2M across 200 loans".
True, useful for accountability, silent on whether the loans
produced any business growth.
The integration here is structural, not procedural. The borrower record,
the loan ledger, the survey responses, and the program officer's notes
are not separate systems with a stitching layer on top. They are the
same record, captured at different moments, with the same persistent
ID running through every entry. That is what lets a funder's outcome
question be answered in a query, not a quarter.
PROGRAM CONTEXTS
Three program shapes. The same metric architecture works in each.
The principles do not change between sectors. What changes is which
metric goes in which slot, who the participants are, and how often a
measurement moment is feasible. Three shapes below cover the most common
patterns.
01
Direct-service nonprofits
Food access, housing, case management. High volume, ongoing relationships.
Typical shape. Walk-in or referral intake. Service
delivered repeatedly to the same household over months or years.
The challenge for measurement is that participants do not graduate;
they cycle in and out, and their needs evolve.
What breaks. Most direct-service organizations report
on outputs (households served, units of food distributed, case-management
hours) and stop there. The outcome question (did the household stabilize?
did housing get secured? did food insecurity improve?) is hard to
answer because the household never gets a "post" survey. They simply
stop coming, for reasons that may be good or bad.
What works. A persistent household ID assigned at
first contact. A short check-in survey at every visit, including
two open-ended prompts. A six-month follow-up survey to households
who have not visited in 90 days, asking what changed. Outcome metrics
built from those touchpoints, not from event attendance.
A SPECIFIC SHAPE
A food security program reports household-level food insecurity
score (USDA 6-item scale) at intake, every 90 days, and at exit.
Six-month follow-up after last visit. Outcome metric: percentage
of households scoring "low or very low food security" at intake
who improved by one category by exit.
Typical shape. Cohorts enroll on a school calendar.
Programming runs for a semester or a full year. Outcomes are
academic (grades, attendance, proficiency), behavioral (engagement,
self-reported confidence), or both. Some outcomes show up later
than the program window.
What breaks. The two common failures are measuring
attendance and calling it engagement, and asking only the kids who
stayed enrolled. Survivor bias makes the program look strong because
the participants who struggled most are not in the post survey.
What works. Track every enrolled participant, including
the ones who left. Compare program participants to a similar group
of non-participants (a waitlist, students at a partner school, a
district-wide benchmark). Pair attendance with two open-ended
prompts at midpoint and exit, then code those responses against
the same engagement rubric every cohort.
A SPECIFIC SHAPE
An after-school reading program tracks reading level (DRA score)
at fall, mid-year, and spring, against a comparison group of
non-enrolled students at the same school. Outcome metric:
percentage of below-grade-level participants who reached grade
level by spring, compared to the same percentage in the comparison
group.
03
Foundation portfolios
Multi-grantee tracking. Aggregating outcomes across programs.
Typical shape. A foundation funding 15 to 50
grantees across one or several program areas. Each grantee runs a
different program with a different theory of change. The foundation
wants a portfolio-level outcome story without forcing every grantee
into the same metric.
What breaks. Two failure modes. First: every grantee
reports against a shared metric set that does not fit any of them
(compliance, not measurement). Second: every grantee picks their
own metrics and the portfolio rolls up to nothing. Both are common,
and both produce reports that no one trusts.
What works. A two-tier metric structure. Tier one is
three to five outcome categories the foundation cares about, named
in plain language (e.g., "income stability", "educational progress",
"housing security"). Tier two is the metric each grantee uses to
show progress in that category, defined by the grantee, validated
by the foundation. The portfolio rolls up by category.
A SPECIFIC SHAPE
A workforce-focused foundation defines two portfolio outcomes
("employment in living-wage roles", "earnings growth"). 22
grantees report against those categories using their own metrics
(job-placement rate, six-month retention, median wage at twelve
months). The annual report aggregates by category, not by metric,
and includes one short narrative per grantee.
A NOTE ON TOOLS
Survey tools collect well. The gap is connecting answers over time.
Google FormsSurveyMonkeyTypeformQualtricsSopact Sense
Most survey tools collect responses well. They handle skip logic, mobile
rendering, and basic exports. The architectural gap shows up the second
time the program surveys the same person. Without a persistent participant
ID built into the data model, a pre survey and a post survey from one
participant are two unconnected rows in two unconnected sheets.
Reconnecting them by name or email is hand work that breaks the moment a
name shortens or an email address changes.
Sopact Sense is built around the persistent ID. The same record carries
every metric the participant produces, across every survey, with the
qualitative responses stored next to the quantitative scores. A metric
definition lives next to its data, so editing the metric does not break
historical reports. That is the architectural choice the rest of this
page argues for.
FAQ
Social impact metrics questions, answered.
Q.01
What are social impact metrics?
A social impact metric is a specific, repeatable measure of a change
a program is meant to produce. A working metric names four things:
what is being counted, who is being counted, when it is measured,
and what counts as a meaningful difference. Metrics that skip any of
those become noise. The most common failure is reporting outputs
(what the program delivered) and calling them outcomes (what changed
for the people served).
Q.02
What is the difference between an impact metric and an impact indicator?
An indicator is a signal. A metric is a definition. The indicator is
the data point you collect (a survey score, a job-placement count, a
revenue number). The metric is the rule that says how to collect it,
from whom, and when, so the data point is comparable across cohorts.
In practice the words are used interchangeably, but the discipline
is the same: write the metric definition first, then collect the
indicator.
Q.03
How do you measure social impact?
Measure social impact by asking the same people the same questions
before the program, after the program, and again later. Pair
quantitative items (rating scales, counts, dates) with two or three
open-ended prompts that capture what changed and why. Use a
persistent ID so a participant's pre, post, and follow-up answers
can be linked. Compare the change inside the program to a comparison
group when feasible. Without those four pieces (same people, same
questions, linked records, comparison), what you have is anecdote,
not measurement.
Q.04
What are some social impact metrics examples?
Workforce program: percentage of graduates employed at six months
in jobs paying above a living-wage threshold, with median wage.
Lending program: percentage of borrowers whose monthly business
revenue increased by twenty percent or more, twelve months after
loan disbursement. Education program: percentage of students reaching
grade-level reading proficiency by year end, with a comparison to
non-participating students in the same district. Each of these
names what is counted, who, when, and what counts as a meaningful
change.
Q.05
What is the difference between an output and an outcome?
An output is a count of what the program delivered. Number of
workshops, number of loans, number of participants enrolled. An
outcome is a measurable change in the people the program serves.
Skills gained, income changed, business survived. Outputs answer
how much the program did. Outcomes answer whether the program
worked. Most published impact reports lead with outputs because
outputs are easier to count. Funders increasingly want outcomes,
because outcomes show whether the spending bought any change.
Q.06
What is a social impact KPI?
A social impact KPI is a small set of outcome metrics chosen because
they signal whether the program is on course. Three to seven KPIs
is the working range; more than that and no one watches them. A
good KPI set includes at least one metric per stage of the program
theory: an early signal (engagement or completion), a primary
outcome (the change the program is meant to produce), and a
downstream outcome (whether the change persisted).
Q.07
How do you calculate social impact?
There is no single formula. The structure is consistent across
methods: define the change in advance, measure the same people
before and after, count how many changed and by how much, and
account for what would have changed without the program. Some
methods translate the result into a dollar value (Social Return on
Investment), some report it as a percentage of participants who
reached a threshold, some report a distribution of change. The
calculation depends on what the funder, board, or program team
needs the number to do.
Q.08
What is impact measurement?
Impact measurement is the practice of collecting data that tests
whether a program is producing the change it set out to produce.
It includes designing the metrics, collecting the data from
participants over time, comparing what happened to what would have
happened anyway, and reporting the result honestly. Impact
measurement is not the same as monitoring (tracking activity counts)
or evaluation (an external study at one moment in time). Measurement
runs continuously and feeds program decisions rather than only annual reports.
Q.09
What is a social impact score?
A social impact score is a single composite number rolling several
metrics together. Scores are useful for cross-portfolio comparison
and external communication. They are not useful for program
improvement, because the score hides which input metric moved and
which did not. Most working programs report the score AND the
underlying metric set, so the score gives a headline and the
metric set gives the explanation.
Q.10
How do you measure community impact?
Community impact is measured at two levels: the change in individual
participants who came through the program (counted with the same
metric set described above) and the change in the wider community
(typically counted with public data such as census, school district,
or health department records). Pairing the two levels matters.
Strong individual outcomes with no community-level shift can mean
the program is reaching the wrong people, or reaching too few of
them to register at scale.
Q.11
What does impact metrics meaning cover?
The phrase asks two related questions: what does the term mean (a
structured measure of program-attributable change) and what kinds
of metrics fall under it. The kinds break into outputs (what was
delivered), outcomes (what changed for participants), and impact
(long-term societal change). Most teams use all three terms loosely.
The discipline is naming which level a given metric measures, so
the report is not mixing categories.
Q.12
What are social impact measurement examples?
A youth mentoring program measuring quarterly attendance plus a
yearly survey of school engagement and academic confidence,
comparing matched students to a waitlist group. A small-business
lending program measuring loan repayment plus six-month and
twelve-month surveys of business revenue and employment, with a
one-page narrative from each borrower at twelve months. A foundation
measuring portfolio outcomes by aggregating each grantee's primary
outcome metric, weighted by population served. Common thread: same
people, same metrics, repeated measurement, linked records.
Q.13
How does Sopact handle metric tracking over time?
Sopact Sense assigns each participant a persistent ID at first
contact. Every survey afterward (pre, mid-cycle, post, follow-up)
links to that ID, so the participant's metrics line up across
moments without spreadsheet matching. Quantitative answers and
open-ended responses are stored together, so a metric and the
explanation behind it are never separated. The platform is built
so the metric definition lives next to the data it produces, which
means metric edits propagate to every record without re-coding.
Q.14
Can I use Google Forms or SurveyMonkey to track impact metrics?
Both tools collect responses well. The architectural gap shows up
the second time you survey the same person. Without a persistent
ID, a pre survey and a post survey from the same participant are
two unconnected rows in two unconnected sheets. Reconnecting them
by name or email is hand work that breaks when a name shortens or
an email address changes. For one-shot feedback the tools are fine.
For tracking change over time, the structure has to do that work
for you.
Bring your metric set. See what your data could show.
Sixty minutes with someone who builds these for a living. We review
the metrics you currently report on, name where outputs are standing
in for outcomes, and sketch the pre, outcome, and follow-up report
shape that would let you measure the change you actually claim.
Sixty minutes by video. One person from your team, one from ours.
Working session, not a sales pitch.
What to bring
Your most recent impact report or grant report. The metric
definitions you currently use. One question your funder keeps
asking that you cannot fully answer.
What you leave with
A short read on which of your current metrics are outputs,
which are outcomes, and which are unit-less numbers that
should be either properly defined or retired.
Impact Metric Wizard
Design metrics that survive board scrutiny
Gate weak ideas fast → lock strong ones with parameters, baselines, and cadence.
This interactive guide walks you through creating both your Impact Statement and complete Data Strategy—with AI-driven recommendations tailored to your program.
Use the Impact Statement Builder to craft measurable statements using the proven formula: [specific outcome] for [stakeholder group] through [intervention] measured by [metrics + feedback]
Design your Data Strategy with the 12-question wizard that maps Contact objects, forms, Intelligent Cell configurations, and workflow automation—exportable as an Excel blueprint
See real examples from workforce training, maternal health, and sustainability programs showing how statements translate into clean data collection
Learn the framework approach that reverses traditional strategy design: start with clean data collection, then let your impact framework evolve dynamically
Understand continuous feedback loops where Girls Code discovered test scores didn't predict confidence—reshaping their strategy in real time
What You'll Get: A complete Impact Statement using Sopact's proven formula, a downloadable Excel Data Strategy Blueprint covering Contact structures, form configurations, Intelligent Suite recommendations (Cell, Row, Column, Grid), and workflow automation—ready to implement independently or fast-track with Sopact Sense.
Key terms, best practices, and concrete examples
Activity Metrics
Definition: Counts of what you did. They prove delivery capacity, not effect. Use when: You need operational control or inputs for funnels. Example (workforce training):
Metric: “Number of coaching sessions delivered per learner per month.”
Parameters: Integer ≥0; disaggregate by site and coach; suppress n<10.
Why it’s useful: Predicts throughput and identifies resource constraints. Pitfall: Treating “hours trained” as success. Without outcomes, this is vanity.
Output Metrics
Definition: Immediate products/participation—who completed, who received. Use when: You’re testing pipeline health and equity by segment. Example (scholarship):
Metric: “Share of accepted applicants who submit verification on time.”
Parameters: Percentage 0–100; window = 14 days post-award; by gender/language.
Why it’s useful: Indicates operational friction that blocks outcomes. Pitfall: Reporting high completion without checking who is missing.
Outcome Metrics
Definition: Changes experienced by people—knowledge, behavior, status. Use when: You want proof of improvement and drivers of that change. Example (coding bootcamp):
Metric: “% of learners improving ≥1 level in self-reported coding confidence (PRE→POST).”
Parameters: Likert 1–5; improvement = POST – PRE ≥ 1; exclude missing PRE; report n and suppression rules; pair with coded themes from open-text (“practice time”, “peer help”).
Why it’s useful: Ties numbers to narratives; credible and explainable.
What is a good metric?
Mission-anchored: Direct line to your outcome pathway (not just a convenient count).
Operationalized: Clear where data comes from, how to compute it, and who owns it.
Parameterized: Ranges, units, suppression, and disaggregation defined.
“Improve confidence.” → Vague; no scale, threshold, or baseline.
“Job placement rate” with no denominator definition → Ambiguous; who’s eligible? timeframe?
“100% satisfaction” from 9 respondents → Statistically weak; low-n and bias not handled.
“Sentiment score from social media” → Unreliable unless your beneficiaries are actually represented there and consented.
Use-case walk-throughs (plug these into the wizard)
Scholarship program (Outcome)
Draft definition: “% of recipients who report reduced financial stress after first term.”
Parameters: 5-point stress scale; change ≥1 point; measured PRE (award) and POST (end of term); suppress n<10; disaggregate by campus and first-gen status.
Usage guideline: Join unique_id across application and term survey; compute POST–PRE; code open-text for ‘work hours’ and ‘food insecurity’; attach 2–3 quotes.
Cadence: Termly; audience = Board + donors.
Baseline: Fall 2025 pilot.
Workforce upskilling (Output → Outcome ladder)
Output: “% of enrolled who complete 4+ practice labs weekly.” (predictor)
Outcome: “% who pass external certification within 60 days of course end.”
Best practice: Report both, plus a simple correlation view (completion vs. pass rate) and 2–3 qualitative drivers from post-exam interviews.
CSR supplier training (Activity → Output)
Activity: “# of supplier sites trained on safety module.”
Output: “% of trained sites implementing 3 of 5 required safety practices within 90 days.”
Outcome (longer horizon): “Rate of recordable incidents per 200k hours, year-over-year.”
Devil’s-advocate checks before you ship
If the owner can’t compute it alone from the instructions, it will rot.
If your baseline is soft (or missing), your “lift” number is a guess.
If you can’t name the decision this will change next quarter, it’s theater.
If a metric harms (e.g., incentivizes short-term gaming or penalizes vulnerable groups), redesign it with safeguards and qualitative context.
Impact & ESG Metrics Standards Catalog
IMPACT & ESG METRICS STANDARDS CATALOG
Comprehensive directory of metrics terminology, standards and frameworks