play icon for videos

Social impact metrics: outputs, outcomes, indicators

A plain-language guide to social impact metrics: the difference between outputs and outcomes, six properties of a working metric, and a worked example.

US
Pioneering the best AI-native application & portfolio intelligence platform
Updated
May 4, 2026
360 feedback training evaluation
Use Case
SOCIAL IMPACT METRICS
An output counts what was delivered. An outcome measures what changed. Most impact reports show the wrong one.

This guide explains the difference in plain terms, names the six properties every working metric has, and shows what a complete metric set looks like for a program. Worked example comes from a community lending program. No prior background needed.

THE PATHWAY

Five tiers from resources to societal change.

Every social program runs along the same chain. Resources go in. Activities happen. Things get delivered. People change. The world shifts a little. The line between what got delivered and whether anyone changed is where most reports break down.

Causal pathway, left to right
01
Inputs
Money, staff, materials, time invested in the program.
02
Activities
What the program does. Workshops run, loans processed, services provided.
03
Outputs
Counts of what got delivered. 200 loans. 50 graduates. 4,000 meals.
04
Outcomes
Measurable change in the people served. Income gained. Skill acquired. Business survived.
05
Impact
Long-term, broader change in the community or system the program touches.
What each tier answers
How much was invested?
What did the program do?
How much got delivered?
Did participants change?
Did the world change?

The boundary between tier 03 and tier 04 is the boundary between activity reporting and impact measurement. Output metrics are easy to count and prove accountability for spending. Outcome metrics are harder to count and answer whether the spending bought any change.

Tier names are conventional across IRIS+, the Logic Model, and most funder reporting frameworks. The labels matter less than the discipline of naming which tier each metric measures, so a published report does not mix categories.
DEFINITIONS

Five terms, defined the way the data has to support them.

The vocabulary around social impact metrics overlaps. Indicators, KPIs, scores, measures: most teams use them interchangeably. The difference shows up later, when a report has to be written and the words have not been defined the same way across the team. These five definitions are the ones the rest of this page uses.

What are social impact metrics?

A social impact metric is a specific, repeatable measure of a change a program is meant to produce. A working metric names four things: what is being counted, who is being counted, when it is measured, and what counts as a meaningful difference.

Metrics that skip any of those four become noise. The most common failure is reporting outputs (what the program delivered) and calling them outcomes (what changed for the people served). The two are not interchangeable, and a funder who knows the difference will spot the mix in the first paragraph.

What is the difference between an impact metric and an impact indicator?

An indicator is a signal. A metric is a definition. The indicator is the data point you collect (a survey score, a job-placement count, a revenue number). The metric is the rule that says how to collect it, from whom, and when, so the data point is comparable across cohorts.

In daily practice the words are used interchangeably. The discipline is the same either way: write the metric definition first, then collect the indicator. A program that defines indicators in the report-writing phase rather than the design phase ends up with data that does not fit the question.

How to measure social impact?

Measuring social impact means asking the same people the same questions before the program, after the program, and again later, then comparing the answers. Pair quantitative items (rating scales, counts, dates) with two or three open-ended prompts that capture what changed and why.

Use a persistent ID so a participant's pre, post, and follow-up answers can be linked. Compare the change inside the program to a comparison group when feasible. Without those four pieces (same people, same questions, linked records, comparison) what you have is anecdote, not measurement.

What is a social impact KPI?

A social impact KPI is a small set of outcome metrics chosen because they signal whether the program is on course. Three to seven KPIs is the working range. More than that and no one watches them.

A good KPI set includes at least one metric per stage of the program theory: an early signal (engagement or completion), a primary outcome (the change the program is meant to produce), and a downstream outcome (whether the change persisted at six or twelve months). The KPI list is the report cover. The full metric set is the body of the report.

What is a social impact score?

A social impact score is a single composite number rolling several metrics together. Scores are useful for cross-portfolio comparison and for external communication where one number is easier to carry than seven. They are not useful for program improvement, because the score hides which input metric moved and which did not.

Most working programs report the score AND the underlying metric set. The score gives a headline. The metric set gives the explanation. A score without its metric set is a marketing number.

RELATED, BUT NOT THE SAME

Four neighbors of "metric" that get confused for it.

Output

A count of what the program delivered. Workshops held, loans disbursed, meals served. Measures effort, not effect.

Outcome

A measurable change in the people the program is meant to serve. Income gained, skill acquired, business survived.

Indicator

The data point itself. A score, a percentage, a yes-or-no answer. Indicators sit inside metrics.

Impact

Long-term, broader change at the community or system level. Outcomes are about participants. Impact is about the world.

PROPERTIES OF A WORKING METRIC

Six properties every metric needs to do its job.

Most metrics fail one or two of these and still get reported. The result reads as evidence and behaves as noise. The six properties below are the filter every metric on the dashboard should pass before it goes in front of a board, a funder, or a program team trying to decide what to change.

01 · LEVEL

Output or outcome, named.

A metric is one or the other. Never both.

Workshops held is an output. Skills retained six months later is an outcome. Mixing them in the same column produces a report that looks comprehensive and answers no question.


Why it matters: Funders read for outcomes. Output-heavy reports get filed; outcome-led reports get funded.

02 · UNIT

Numerator and denominator.

Every count needs a context.

"Served 500 participants" answers nothing. "Served 500 of 800 eligible households in the catchment area, 62 percent" answers reach. The denominator is what makes the numerator legible across years and cohorts.


Why it matters: A raw count grows with budget. A ratio shows whether reach changed.

03 · CHANGE

Direction and magnitude.

Did it improve, and by how much?

"Improved confidence" tells you nothing. "Confidence rose from 5.4 to 7.2 on a 10-point scale, on average, six months in" tells you the direction (up), the size (1.8 points), and that someone has a baseline.


Why it matters: Magnitude separates a real shift from survey-week mood.

04 · LINKAGE

Same people over time.

Cross-sectional comparisons hide change.

Comparing the average of cohort A's pre survey to cohort B's post survey tells you about two different groups, not about one group changing. A persistent participant ID is the only structure that links a person's pre to their own post.


Why it matters: Without linkage, the report is two snapshots, not measurement.

05 · EXPLANATION

Numbers paired with words.

Quantitative anchors. Qualitative reasons.

A metric that moved is more useful when paired with two or three short open-ended responses from the same participant explaining what happened. The number tells you what changed. The words tell you why, and what the program did that worked.


Why it matters: Open-ended responses surface which program element drove the change.

06 · CAUSE

Comparison or counterfactual.

Did the program cause it, or was it happening anyway?

Pre-post change inside a program tells you something happened. A comparison group, a regional benchmark, or a waitlisted cohort tells you whether the program caused it. The honest version of impact measurement always carries some attempt at counterfactual.


Why it matters: Without a comparison, "the economy improved" is a competing explanation.

METRIC CHOICE MATRIX

Seven decisions that decide whether the metric works.

Most teams design a metric set in a single afternoon and live with the consequences for years. Each row of this social impact matrix is one decision the team is making, knowingly or not. The broken-way column is the workflow most teams fall into. The working-way column is what the page argues for.

The choice
The broken way
The working way
What this decides
Choosing what to count
Output or outcome
BROKEN

Counting what is easy to count: workshops held, participants enrolled, services delivered. Calling those numbers the impact.

WORKING

Counting what changed for the people served. The output stays in the report as context, not as the headline.

Whether the report measures effort or effect. Funders read the headline first.

Tracking participants
Anonymous or linked
BROKEN

One-shot survey at the end. Or two surveys collected anonymously, with no way to link a person's pre to their own post.

WORKING

Persistent participant ID assigned at first contact. Pre, post, and follow-up all link to the same record automatically.

Whether the team can attribute change to the program at the individual level, not the cohort average.

Setting the metric scale
Vague or bounded
BROKEN

Aspirational language: "improved wellbeing", "stronger community", "increased confidence". No scale. No threshold. Nothing to compare.

WORKING

Bounded scale named in the metric definition: a 1-to-10 score, a percent change, a yes-or-no threshold. Measured at named moments.

Whether the metric is repeatable across cohorts and years, or only describes one report.

Numerator and denominator
Raw or contextualized
BROKEN

Reporting the count alone: "served 500 people". Reach is unknowable. Year-over-year comparison reads as growth when budget grew.

WORKING

Reporting numerator over denominator: "served 500 of 800 eligible, 62 percent". Reach is a ratio. Cohorts compare cleanly.

Whether the metric scales with program size or simply inflates with budget.

Combining numbers and words
Separated or linked
BROKEN

A quantitative survey on one platform. Open-ended interview notes in a separate document. The two never line up to the same person.

WORKING

Quantitative scale and open-ended prompt collected in the same instrument, against the same participant ID. Number plus reason, stored together.

Whether the report can explain why a metric moved, not only whether it moved.

Choosing a comparison
Inside-only or external
BROKEN

Pre-to-post change reported as the impact, with no benchmark. "The economy got better" is a perfectly good rival explanation.

WORKING

Pre-to-post change paired with a comparison: a waitlist cohort, a regional benchmark, a public dataset. The honest version names what the comparison is and what it cannot rule out.

Whether the program can credibly claim cause, not mere correlation.

Reporting cadence
Annual or rolling
BROKEN

One annual impact report. Data assembled from disconnected sources in the six weeks before the deadline. The team cannot correct course mid-year.

WORKING

Quarterly cohort reviews against the same metric set. Annual rollup is a summary, not a build. Drift gets caught while it can still be addressed.

How fast the program can correct course when a metric drifts the wrong way.

The first decision controls all the others. A team that chooses outputs over outcomes does not need persistent IDs, does not need bounded scales, does not need comparison groups. The decision to measure outcomes is the decision to invest in the rest of the matrix.

A WORKED EXAMPLE

Small-business lending: from output count to outcome metric.

A community development financial institution (CDFI) lending to micro and small businesses in low-income neighborhoods. The team has reported "loans disbursed" and "dollars deployed" for years. A new funder is asking what the loans actually produced.

We have always been able to say how many loans we made and how much we moved out the door. The new funder wants to know whether the businesses we lent to are still operating, whether monthly revenue grew, and how many people they employ now. We have the data, sort of. Some is in our loan-management system, some is in survey responses we ran twice and never linked, and some is in the program officer's head. None of it rolls up.

Lending program director, mid-portfolio review
THE METRIC SET, IN TWO AXES
Quantitative
Numbers, scales, counts
  • Loan amount and term
  • Repayment status, months since disbursement
  • Monthly business revenue at month 0, 6, 12
  • Employees on payroll at month 0, 6, 12
  • Business operating status at 12 months (yes / no / pivoted)
Qualitative
Open-ended responses
  • What did this loan let you do that you could not have done otherwise?
  • What changed about how you run the business in the last year?
  • What was the hardest month and what got you through it?
  • What would you tell the program team to do differently?
What the working setup produces
A linked borrower file

Every survey response, loan record, and program note tied to one persistent borrower ID. No spreadsheet matching at report time.

Pre-and-post comparable revenue

Monthly revenue collected at month 0, 6, and 12 with the same question wording. The change is a real change, not a reframed question.

Reasons attached to numbers

When a borrower's revenue jumps or drops, the open-ended response from the same survey explains why, in the borrower's own words.

A reportable outcome metric

"Sixty-five percent of borrowers grew monthly revenue by twenty percent or more, twelve months after disbursement." Funder-grade.

Why traditional tools fail here
Two unlinked surveys

Pre survey is one Google Form. Post survey is another. Names shorten, emails change, and matching is hand work that breaks at scale.

Quant and qual stored apart

Numbers in one spreadsheet, interview notes in a Word doc. The "why" never lines up to the "what" without manual cross-walking.

No business operating status

When a borrower stops responding, the team cannot tell if the business closed, pivoted, or simply went silent. Survival rate is unknowable.

Output-only reporting

The published report says "deployed $4.2M across 200 loans". True, useful for accountability, silent on whether the loans produced any business growth.

The integration here is structural, not procedural. The borrower record, the loan ledger, the survey responses, and the program officer's notes are not separate systems with a stitching layer on top. They are the same record, captured at different moments, with the same persistent ID running through every entry. That is what lets a funder's outcome question be answered in a query, not a quarter.

PROGRAM CONTEXTS

Three program shapes. The same metric architecture works in each.

The principles do not change between sectors. What changes is which metric goes in which slot, who the participants are, and how often a measurement moment is feasible. Three shapes below cover the most common patterns.

01

Direct-service nonprofits

Food access, housing, case management. High volume, ongoing relationships.

Typical shape. Walk-in or referral intake. Service delivered repeatedly to the same household over months or years. The challenge for measurement is that participants do not graduate; they cycle in and out, and their needs evolve.

What breaks. Most direct-service organizations report on outputs (households served, units of food distributed, case-management hours) and stop there. The outcome question (did the household stabilize? did housing get secured? did food insecurity improve?) is hard to answer because the household never gets a "post" survey. They simply stop coming, for reasons that may be good or bad.

What works. A persistent household ID assigned at first contact. A short check-in survey at every visit, including two open-ended prompts. A six-month follow-up survey to households who have not visited in 90 days, asking what changed. Outcome metrics built from those touchpoints, not from event attendance.

A SPECIFIC SHAPE

A food security program reports household-level food insecurity score (USDA 6-item scale) at intake, every 90 days, and at exit. Six-month follow-up after last visit. Outcome metric: percentage of households scoring "low or very low food security" at intake who improved by one category by exit.

02

Education and youth programs

After-school, mentoring, training. Cohort-shaped, multi-year.

Typical shape. Cohorts enroll on a school calendar. Programming runs for a semester or a full year. Outcomes are academic (grades, attendance, proficiency), behavioral (engagement, self-reported confidence), or both. Some outcomes show up later than the program window.

What breaks. The two common failures are measuring attendance and calling it engagement, and asking only the kids who stayed enrolled. Survivor bias makes the program look strong because the participants who struggled most are not in the post survey.

What works. Track every enrolled participant, including the ones who left. Compare program participants to a similar group of non-participants (a waitlist, students at a partner school, a district-wide benchmark). Pair attendance with two open-ended prompts at midpoint and exit, then code those responses against the same engagement rubric every cohort.

A SPECIFIC SHAPE

An after-school reading program tracks reading level (DRA score) at fall, mid-year, and spring, against a comparison group of non-enrolled students at the same school. Outcome metric: percentage of below-grade-level participants who reached grade level by spring, compared to the same percentage in the comparison group.

03

Foundation portfolios

Multi-grantee tracking. Aggregating outcomes across programs.

Typical shape. A foundation funding 15 to 50 grantees across one or several program areas. Each grantee runs a different program with a different theory of change. The foundation wants a portfolio-level outcome story without forcing every grantee into the same metric.

What breaks. Two failure modes. First: every grantee reports against a shared metric set that does not fit any of them (compliance, not measurement). Second: every grantee picks their own metrics and the portfolio rolls up to nothing. Both are common, and both produce reports that no one trusts.

What works. A two-tier metric structure. Tier one is three to five outcome categories the foundation cares about, named in plain language (e.g., "income stability", "educational progress", "housing security"). Tier two is the metric each grantee uses to show progress in that category, defined by the grantee, validated by the foundation. The portfolio rolls up by category.

A SPECIFIC SHAPE

A workforce-focused foundation defines two portfolio outcomes ("employment in living-wage roles", "earnings growth"). 22 grantees report against those categories using their own metrics (job-placement rate, six-month retention, median wage at twelve months). The annual report aggregates by category, not by metric, and includes one short narrative per grantee.

A NOTE ON TOOLS

Survey tools collect well. The gap is connecting answers over time.

Google Forms SurveyMonkey Typeform Qualtrics Sopact Sense

Most survey tools collect responses well. They handle skip logic, mobile rendering, and basic exports. The architectural gap shows up the second time the program surveys the same person. Without a persistent participant ID built into the data model, a pre survey and a post survey from one participant are two unconnected rows in two unconnected sheets. Reconnecting them by name or email is hand work that breaks the moment a name shortens or an email address changes.

Sopact Sense is built around the persistent ID. The same record carries every metric the participant produces, across every survey, with the qualitative responses stored next to the quantitative scores. A metric definition lives next to its data, so editing the metric does not break historical reports. That is the architectural choice the rest of this page argues for.

FAQ

Social impact metrics questions, answered.

Q.01

What are social impact metrics?

A social impact metric is a specific, repeatable measure of a change a program is meant to produce. A working metric names four things: what is being counted, who is being counted, when it is measured, and what counts as a meaningful difference. Metrics that skip any of those become noise. The most common failure is reporting outputs (what the program delivered) and calling them outcomes (what changed for the people served).

Q.02

What is the difference between an impact metric and an impact indicator?

An indicator is a signal. A metric is a definition. The indicator is the data point you collect (a survey score, a job-placement count, a revenue number). The metric is the rule that says how to collect it, from whom, and when, so the data point is comparable across cohorts. In practice the words are used interchangeably, but the discipline is the same: write the metric definition first, then collect the indicator.

Q.03

How do you measure social impact?

Measure social impact by asking the same people the same questions before the program, after the program, and again later. Pair quantitative items (rating scales, counts, dates) with two or three open-ended prompts that capture what changed and why. Use a persistent ID so a participant's pre, post, and follow-up answers can be linked. Compare the change inside the program to a comparison group when feasible. Without those four pieces (same people, same questions, linked records, comparison), what you have is anecdote, not measurement.

Q.04

What are some social impact metrics examples?

Workforce program: percentage of graduates employed at six months in jobs paying above a living-wage threshold, with median wage. Lending program: percentage of borrowers whose monthly business revenue increased by twenty percent or more, twelve months after loan disbursement. Education program: percentage of students reaching grade-level reading proficiency by year end, with a comparison to non-participating students in the same district. Each of these names what is counted, who, when, and what counts as a meaningful change.

Q.05

What is the difference between an output and an outcome?

An output is a count of what the program delivered. Number of workshops, number of loans, number of participants enrolled. An outcome is a measurable change in the people the program serves. Skills gained, income changed, business survived. Outputs answer how much the program did. Outcomes answer whether the program worked. Most published impact reports lead with outputs because outputs are easier to count. Funders increasingly want outcomes, because outcomes show whether the spending bought any change.

Q.06

What is a social impact KPI?

A social impact KPI is a small set of outcome metrics chosen because they signal whether the program is on course. Three to seven KPIs is the working range; more than that and no one watches them. A good KPI set includes at least one metric per stage of the program theory: an early signal (engagement or completion), a primary outcome (the change the program is meant to produce), and a downstream outcome (whether the change persisted).

Q.07

How do you calculate social impact?

There is no single formula. The structure is consistent across methods: define the change in advance, measure the same people before and after, count how many changed and by how much, and account for what would have changed without the program. Some methods translate the result into a dollar value (Social Return on Investment), some report it as a percentage of participants who reached a threshold, some report a distribution of change. The calculation depends on what the funder, board, or program team needs the number to do.

Q.08

What is impact measurement?

Impact measurement is the practice of collecting data that tests whether a program is producing the change it set out to produce. It includes designing the metrics, collecting the data from participants over time, comparing what happened to what would have happened anyway, and reporting the result honestly. Impact measurement is not the same as monitoring (tracking activity counts) or evaluation (an external study at one moment in time). Measurement runs continuously and feeds program decisions rather than only annual reports.

Q.09

What is a social impact score?

A social impact score is a single composite number rolling several metrics together. Scores are useful for cross-portfolio comparison and external communication. They are not useful for program improvement, because the score hides which input metric moved and which did not. Most working programs report the score AND the underlying metric set, so the score gives a headline and the metric set gives the explanation.

Q.10

How do you measure community impact?

Community impact is measured at two levels: the change in individual participants who came through the program (counted with the same metric set described above) and the change in the wider community (typically counted with public data such as census, school district, or health department records). Pairing the two levels matters. Strong individual outcomes with no community-level shift can mean the program is reaching the wrong people, or reaching too few of them to register at scale.

Q.11

What does impact metrics meaning cover?

The phrase asks two related questions: what does the term mean (a structured measure of program-attributable change) and what kinds of metrics fall under it. The kinds break into outputs (what was delivered), outcomes (what changed for participants), and impact (long-term societal change). Most teams use all three terms loosely. The discipline is naming which level a given metric measures, so the report is not mixing categories.

Q.12

What are social impact measurement examples?

A youth mentoring program measuring quarterly attendance plus a yearly survey of school engagement and academic confidence, comparing matched students to a waitlist group. A small-business lending program measuring loan repayment plus six-month and twelve-month surveys of business revenue and employment, with a one-page narrative from each borrower at twelve months. A foundation measuring portfolio outcomes by aggregating each grantee's primary outcome metric, weighted by population served. Common thread: same people, same metrics, repeated measurement, linked records.

Q.13

How does Sopact handle metric tracking over time?

Sopact Sense assigns each participant a persistent ID at first contact. Every survey afterward (pre, mid-cycle, post, follow-up) links to that ID, so the participant's metrics line up across moments without spreadsheet matching. Quantitative answers and open-ended responses are stored together, so a metric and the explanation behind it are never separated. The platform is built so the metric definition lives next to the data it produces, which means metric edits propagate to every record without re-coding.

Q.14

Can I use Google Forms or SurveyMonkey to track impact metrics?

Both tools collect responses well. The architectural gap shows up the second time you survey the same person. Without a persistent ID, a pre survey and a post survey from the same participant are two unconnected rows in two unconnected sheets. Reconnecting them by name or email is hand work that breaks when a name shortens or an email address changes. For one-shot feedback the tools are fine. For tracking change over time, the structure has to do that work for you.

WORKING SESSION

Bring your metric set. See what your data could show.

Sixty minutes with someone who builds these for a living. We review the metrics you currently report on, name where outputs are standing in for outcomes, and sketch the pre, outcome, and follow-up report shape that would let you measure the change you actually claim.

Format

Sixty minutes by video. One person from your team, one from ours. Working session, not a sales pitch.

What to bring

Your most recent impact report or grant report. The metric definitions you currently use. One question your funder keeps asking that you cannot fully answer.

What you leave with

A short read on which of your current metrics are outputs, which are outcomes, and which are unit-less numbers that should be either properly defined or retired.

Impact Metric Wizard

Design metrics that survive board scrutiny

Gate weak ideas fast → lock strong ones with parameters, baselines, and cadence.

Download the Framework
S1Gate — Measure What MattersStep 1 of 7
Edit the example to your own metric sentence.
Does this metric advance your mission, not just what’s convenient to count?
Logistics, respondent burden, consent, cost.
If data exists, link where it lives; avoid duplicating effort.
Is this about results for people (not activities)?
When to stop

If this fails mission or feasibility, convert to a lightweight activity metric or a proxy, and revisit later.

S2Define — Ownership & StandardsStep 2 of 7
Reference the original standard to keep consistency and credibility.
One owner. No committees.
S3Structure — Data Type & ParametersStep 3 of 7
Be explicit: range, unit, rounding, suppression, and disaggregation keys.
Think “recipe”: anyone on your team should reproduce the same number.
S4Cadence — Match Decisions, Not HypeStep 4 of 7
Match cadence to decision cycles. Faster is not always better.
Only include segments that matter to a decision; suppress low-n.
S5Baseline & Targets — Thresholds that Trigger ActionStep 5 of 7
Linking evidence builds trust: PDFs, transcripts, or coded notes.
S6Quality Check — C-FAIRStep 6 of 7
If any box is unchecked, don’t publish—fix the gap first.
S7Report — Print or CopyStep 7 of 7

Impact Metric Summary

Label
Confidence Lift %
Definition
Share of scholarship recipients…
Programs
Girls Code; Workforce Upskilling
Standard
Owner
Type
Percentage (0–100)
Parameters
Usage
Sample
Cadence
Monthly — Executive/Board
Disaggregation
Baseline
Thresholds
Evidence
C-FAIR
Reason
Donor/Funder requirement
Mission Fit
Yes
Feasible
Yes
Impact Strategy CTA

Build Your AI-Powered Impact Strategy in Minutes, Not Months

Create Your Impact Statement & Data Strategy

This interactive guide walks you through creating both your Impact Statement and complete Data Strategy—with AI-driven recommendations tailored to your program.

  • Use the Impact Statement Builder to craft measurable statements using the proven formula: [specific outcome] for [stakeholder group] through [intervention] measured by [metrics + feedback]
  • Design your Data Strategy with the 12-question wizard that maps Contact objects, forms, Intelligent Cell configurations, and workflow automation—exportable as an Excel blueprint
  • See real examples from workforce training, maternal health, and sustainability programs showing how statements translate into clean data collection
  • Learn the framework approach that reverses traditional strategy design: start with clean data collection, then let your impact framework evolve dynamically
  • Understand continuous feedback loops where Girls Code discovered test scores didn't predict confidence—reshaping their strategy in real time

What You'll Get: A complete Impact Statement using Sopact's proven formula, a downloadable Excel Data Strategy Blueprint covering Contact structures, form configurations, Intelligent Suite recommendations (Cell, Row, Column, Grid), and workflow automation—ready to implement independently or fast-track with Sopact Sense.

Key terms, best practices, and concrete examples

Activity Metrics

Definition: Counts of what you did. They prove delivery capacity, not effect.
Use when: You need operational control or inputs for funnels.
Example (workforce training):

  • Metric: “Number of coaching sessions delivered per learner per month.”
  • Parameters: Integer ≥0; disaggregate by site and coach; suppress n<10.
  • Why it’s useful: Predicts throughput and identifies resource constraints.
    Pitfall: Treating “hours trained” as success. Without outcomes, this is vanity.

Output Metrics

Definition: Immediate products/participation—who completed, who received.
Use when: You’re testing pipeline health and equity by segment.
Example (scholarship):

  • Metric: “Share of accepted applicants who submit verification on time.”
  • Parameters: Percentage 0–100; window = 14 days post-award; by gender/language.
  • Why it’s useful: Indicates operational friction that blocks outcomes.
    Pitfall: Reporting high completion without checking who is missing.

Outcome Metrics

Definition: Changes experienced by people—knowledge, behavior, status.
Use when: You want proof of improvement and drivers of that change.
Example (coding bootcamp):

  • Metric: “% of learners improving ≥1 level in self-reported coding confidence (PRE→POST).”
  • Parameters: Likert 1–5; improvement = POST – PRE ≥ 1; exclude missing PRE; report n and suppression rules; pair with coded themes from open-text (“practice time”, “peer help”).
  • Why it’s useful: Ties numbers to narratives; credible and explainable.

What is a good metric?

  • Mission-anchored: Direct line to your outcome pathway (not just a convenient count).
  • Operationalized: Clear where data comes from, how to compute it, and who owns it.
  • Parameterized: Ranges, units, suppression, and disaggregation defined.
  • Comparable: Baseline locked; cadence matches decision cycles.
  • Evidence-linked: Quotes/files or rubric scores that explain the “why.”
  • Ethical: Consent, privacy, and potential harm assessed.

What is not a good metric (and why)

  • “Train 500 hours this quarter.” → Activity only; hours ≠ benefit.
  • “Improve confidence.” → Vague; no scale, threshold, or baseline.
  • “Job placement rate” with no denominator definition → Ambiguous; who’s eligible? timeframe?
  • “100% satisfaction” from 9 respondents → Statistically weak; low-n and bias not handled.
  • “Sentiment score from social media” → Unreliable unless your beneficiaries are actually represented there and consented.

Use-case walk-throughs (plug these into the wizard)

Scholarship program (Outcome)

  • Draft definition: “% of recipients who report reduced financial stress after first term.”
  • Parameters: 5-point stress scale; change ≥1 point; measured PRE (award) and POST (end of term); suppress n<10; disaggregate by campus and first-gen status.
  • Usage guideline: Join unique_id across application and term survey; compute POST–PRE; code open-text for ‘work hours’ and ‘food insecurity’; attach 2–3 quotes.
  • Cadence: Termly; audience = Board + donors.
  • Baseline: Fall 2025 pilot.

Workforce upskilling (Output → Outcome ladder)

  • Output: “% of enrolled who complete 4+ practice labs weekly.” (predictor)
  • Outcome: “% who pass external certification within 60 days of course end.”
  • Best practice: Report both, plus a simple correlation view (completion vs. pass rate) and 2–3 qualitative drivers from post-exam interviews.

CSR supplier training (Activity → Output)

  • Activity: “# of supplier sites trained on safety module.”
  • Output: “% of trained sites implementing 3 of 5 required safety practices within 90 days.”
  • Outcome (longer horizon): “Rate of recordable incidents per 200k hours, year-over-year.”

Devil’s-advocate checks before you ship

  • If the owner can’t compute it alone from the instructions, it will rot.
  • If your baseline is soft (or missing), your “lift” number is a guess.
  • If you can’t name the decision this will change next quarter, it’s theater.
  • If a metric harms (e.g., incentivizes short-term gaming or penalizes vulnerable groups), redesign it with safeguards and qualitative context.

Impact & ESG Metrics Standards Catalog

IMPACT & ESG METRICS STANDARDS CATALOG

Comprehensive directory of metrics terminology, standards and frameworks

FILTER BY CATEGORY:
Showing 0 standards