Select Impact Themes to Begin
Click on themes in the sidebar to expand sub-themes, then select the ones relevant to your program

New webinar on 3rd March 2026 | 9:00 am PT
In this webinar, discover how Sopact Sense revolutionizes data collection and analysis.
Impact data is evidence of program outcomes. Learn how to collect, define, and analyze it without drift. Build an IRIS+-aligned dictionary in Sopact.
A regional nonprofit runs five workforce programs across three states. In October, the CFO asks each program lead to report "completion rate" for the quarterly board deck. Five programs return five different numbers — because five teams each decided, at different points over three years, what "completion" meant. One counts graduates. One counts anyone who attended eighty percent of sessions. One counts everyone who submitted the final assessment, pass or fail. The numbers are all correct and all unusable. That is the Definition Drift — the silent, compounding slippage in what each field of impact data actually means across teams, sites, and time. It is invisible until a funder asks a comparison question, and it is irreversible after the data has already been collected.
Last updated: April 2026
Impact data is the evidence a program, fund, or portfolio uses to demonstrate that something changed. But collecting data and producing insight are not the same thing — and the gap between them is almost entirely a data-definition problem. This page walks through what impact data is, how to collect it so it does not drift, and how to analyze it without a three-month cleanup cycle.
Impact data is structured evidence that a program, investment, or intervention changed something for the people or systems it was designed to serve. It includes outputs (activities delivered), outcomes (changes experienced), demographics (who was reached), and qualitative responses (why the change happened or didn't). Unlike transactional data — which records that an activity occurred — impact data is designed to answer whether the activity made a difference. SurveyMonkey and Google Forms collect data; they do not collect impact data unless the collection is structured to connect a baseline, an intervention, and a follow-up to the same individual over time.
Impact data becomes usable only when three conditions are met at the point of collection. Each respondent has a persistent identifier that carries across every survey, form, and follow-up. Each field has a locked definition stored in a data dictionary. Each response can be disaggregated without re-exporting and re-merging files. Sopact Sense enforces all three at intake. A typical survey tool enforces none.
Impact data collection is the structured process of gathering evidence — baseline responses, attendance, outcome measures, and qualitative feedback — from program participants or stakeholders over time. It differs from general survey collection in one critical way: every response must connect to the same person across multiple touchpoints, or the data cannot answer longitudinal questions.
Most tools treat each survey as a standalone event. A participant fills the intake survey in January, a mid-program check-in in April, and an exit survey in September — and the three responses live in three separate spreadsheets with no reliable way to link them unless the participant typed their email identically all three times. Sopact Sense assigns a persistent unique link to each participant at first contact, and every subsequent response lands in the same row automatically. That structural difference is what makes a genuine pre-post survey design possible rather than merely attempted.
Impact data analytics is the structured analysis of program or portfolio evidence to answer three questions: did outcomes change, who experienced the change, and why. It spans quantitative work (rates, shifts, comparisons) and qualitative work (themes, reasons, context). Traditional analytics stacks — Tableau, Power BI, Google Data Studio — handle the quantitative side once data is clean. They do not handle the qualitative side at all.
AI-native analytics changes that. With Sopact Sense, a cohort of four hundred open-ended responses to "what was the most valuable part of the program" is themed, tagged, and linked to each respondent's outcome score in minutes — not the three-to-six weeks a human coding team would take. The quantitative dashboard and the qualitative theme summary update from the same source as new responses arrive. That continuous loop is what separates real impact measurement from annual retrospective reporting.
Impact data analysis is the interpretation step — turning clean, linked data into defensible answers about what worked, for whom, and under what conditions. It is the step most programs skip because the preceding cleanup work consumed the budget and the calendar window. By the time analysis begins, the board meeting, the funder report, or the portfolio review has already happened.
Analysis that actually informs decisions requires three inputs that most tools cannot deliver together: a consistent dictionary so comparisons are valid, a persistent ID chain so cohort-level questions are answerable, and themed qualitative responses so the "why" is present in the same view as the "what." Sopact Sense is built specifically to produce those three inputs at the moment data arrives — not three months later.
The failure mode is rarely that data was not collected. Most organizations collect too much. The failure mode is that each field's definition drifts slightly over time, slightly between teams, slightly between sites — and the drift compounds.
Three patterns are responsible for nearly every unusable impact data archive. The first is silent definition forks: Program A records "enrollment" on the day a participant fills the intake form; Program B records "enrollment" on the day a participant attends the first session. Two years later, no one remembers the fork existed — the column is still called "enrollment" in both places. The second is retroactive reconciliation: when the funder asks for a comparison, staff spend days standardizing historical data to a new definition, and the standardization is mostly correct but not perfectly documented. The third is spreadsheet accumulation: each funder report produces a new tab, a new file, a new folder, and by year three the "source of truth" exists in twenty-seven places and agrees in none of them.
The fix is not better spreadsheet hygiene. The fix is a locked impact data dictionary — enforced at the point of collection, not at the point of analysis — combined with a persistent identifier that links every response from the same person across every instrument. This is the architecture impact fund managers, nonprofit CEOs, and evaluation directors describe when they say they want "AI-ready" data. It is also the architecture that makes a logframe actually measurable instead of aspirational.
An impact data dictionary is a centralized reference document that defines every data field your organization collects. It locks each field's name, data type, description, validation rule, and alignment with standards like IRIS+ — so "enrollment" means one thing everywhere, forever.
At minimum, each field in the dictionary needs a machine-readable name (e.g., participant_enrollment_date), a display name ("Participant Enrollment Date"), a data type (number, string, boolean, date, enum, or scale), an exact definition that specifies what counts and what doesn't in one unambiguous sentence, a collection methodology (how is this measured, by whom, when), a validation rule (numeric range, required status, allowed values), and an IRIS+ alignment code where applicable (for example, PI4060 for student enrollment).
Organize the dictionary by category: Output, Outcome, Indicator, Demographic, Survey, and Metadata. This lets program managers find relevant fields quickly and helps funders see at a glance which parts of your data serve which reporting purpose. A nonprofit capacity-building program running five cohorts a year needs perhaps sixty fields; a mid-size impact fund running portfolio monitoring across thirty investees needs one-hundred-fifty. Either way, the structure is the same. The Sopact Sense nonprofit programs solution includes an Impact Data Dictionary Generator that pre-populates dictionaries for eight IRIS+-aligned themes — Education, Agriculture, Health, Financial Inclusion, Employment, Gender Equity, Energy, and Housing. Teams that already have a draft dictionary can import it and lock field definitions directly into their collection forms.
"AI-ready" is the current industry phrase for data that can be analyzed by large language models or automated systems without a manual preparation layer. It means three specific things. First, every row has a stable identifier: responses from the same person, across any number of instruments, land in the same row. Name-and-email matching is not identifier management — it is a fragile heuristic that breaks the first time someone changes jobs or uses a different address. Second, every column has a locked type and definition: a scale field is always the same scale, a date field never contains free text, an enum field rejects values that were not pre-defined. Third, every qualitative response is attached to structured metadata: an open-ended "what did you learn" response is only analyzable if it carries the respondent ID, the cohort, the demographic attributes, and the intervention stage.
Most survey platforms produce exports that fail all three conditions. Google Forms, Typeform, and SurveyMonkey are built for single-instance data collection — not longitudinal, dictionary-locked, AI-ready pipelines. The reconciliation work that platforms like Qualtrics require for longitudinal studies typically runs six to ten weeks per reporting cycle. Sopact Sense is built to produce AI-ready output as the default. Unique participant IDs are assigned at first contact. Dictionary rules are enforced on form submission. Qualitative responses are themed continuously through the platform's analysis layer. The exports are AI-ready because the ingestion was AI-ready. For impact funds specifically, the same architecture underpins impact measurement and management across the portfolio lifecycle.
Clean, dictionary-locked data is the precondition for analysis — not analysis itself. The analytical questions most programs and funds actually need answered fall into three patterns. The shift question: did the outcome measure change from baseline to endline, by how much, and for which subgroups? This requires matched pre-and-post responses from the same respondent — impossible without persistent IDs. The reason question: why did the change happen, or not? The quantitative shift is the "what." The qualitative response is the "why." Both need to live in the same view, for the same respondent, at the same time. The comparison question: how does this cohort compare to the prior cohort, another site, or the sector benchmark? This requires the dictionary to have been locked before either cohort was collected — otherwise the comparison is statistically invalid.
Sopact Sense produces all three answers on demand. Outcome shifts are computed across cohorts directly from the dashboard. Reasons are available as themed clusters from continuous qualitative survey analysis. Comparisons run against the locked dictionary automatically, without a separate data preparation step.
The first mistake is treating output data as outcome data. "Number of workshops delivered" is an output. "Number of participants who changed a behavior" is an outcome. Funders increasingly want the second. Audit your current dictionary: how many fields measure activity versus change? A dictionary dominated by output fields will produce a report that funders increasingly reject as thin.
The second mistake is skipping demographic disaggregation. Without demographic fields captured at intake, you cannot answer "who benefited." You can only answer "how many." Equity-focused funders consider this a disqualifying gap.
The third mistake is collecting qualitative responses with no plan to analyze them. Open-ended questions are the most valuable and most abandoned part of impact data. If your current tool cannot theme qualitative data at scale, either drop the questions or adopt a platform that can. Leaving them in the instrument and ignoring the responses is worse than not asking.
The fourth mistake is versioning dictionary changes informally. Every dictionary change should be documented with a date, a reason, and a note about how historical data maps to the new definition. A field added mid-year without a documented predecessor creates a permanent comparison gap.
The fifth mistake is delaying dictionary design until the first funder request. By that point, the definition fork has already happened — retrospective standardization will mask it, not repair it. Build the dictionary first, using your theory of change as the framework. Collect second.
Impact data is structured evidence that a program, investment, or intervention changed something for the people or systems it was designed to serve. It includes outputs, outcomes, demographics, and qualitative responses linked to individuals over time. The distinguishing feature is the connection: responses must be tied to the same person across touchpoints to answer whether something actually changed.
The Definition Drift is the silent, compounding slippage in what each field of impact data means across teams, sites, and time. Two programs can both record "enrollment" for three years and end up measuring two different things — the drift is invisible until a comparison request arrives. It is the single most common reason impact data archives become unusable, and the reason a locked dictionary matters.
Regular business data records that something happened — a transaction, a login, a form submission. Impact data records whether something changed because of an intervention — an outcome shift, a behavior change, a reported experience. Impact data also requires persistent identifiers so responses from the same person can be connected across time. Most survey tools do not support that connection by default.
Assign a persistent unique identifier to every respondent at first contact, lock every field definition in a data dictionary before collection begins, structure disaggregation variables at intake rather than retroactively, and ensure qualitative responses carry the same respondent ID as quantitative ones. Sopact Sense enforces all four conditions automatically.
At minimum: record ID, participant ID, collection date, consent status, demographics (location, gender, age, relevant equity variables), baseline outcome measures, endline outcome measures, activity completion fields, and open-ended qualitative reflection fields. Align with IRIS+ codes where applicable. Most nonprofit programs need forty to eighty fields; most impact funds need one-hundred-twenty to one-hundred-eighty.
The Global Impact Investing Network (GIIN) maintains IRIS+, a standardized catalog of impact metrics organized by theme. For each field in your dictionary, check the IRIS+ catalog for the closest pre-defined metric and record its code (for example, PI4060 for student enrollment, OI4912 for learning outcomes achieved). Alignment enables benchmarking against sector norms and satisfies most impact investor reporting requirements.
AI-ready impact data has three properties: every row has a stable identifier that connects responses from the same person across instruments, every column has a locked type and definition, and every qualitative response carries the structured metadata needed to analyze it. Most survey tool exports fail all three tests. Sopact Sense is built to produce AI-ready output as the default state, not a post-processing step.
Review the dictionary annually and whenever a program model changes materially. Document every change with a version number, a date, and a note about how historical data maps to the new definition. Major funder cycles may trigger updates — but avoid informal mid-year changes, which create permanent comparison gaps in the data.
Enterprise platforms like Sopact Sense start around $1,000 per month for small to mid-size organizations, with full AI-native analysis included. Legacy survey platforms like Qualtrics typically run $2,000 to $6,000 per month but require separate data-cleaning and qualitative-analysis tools that double the total cost. Generic survey tools like SurveyMonkey are cheaper but do not support longitudinal participant tracking, dictionary enforcement, or qualitative theming at all.
Collection is the gathering of responses — surveys, forms, interviews, document uploads. Analysis is the interpretation step — computing outcome shifts, identifying themes, comparing cohorts. Most organizations collect far more than they analyze because the cleanup work between the two stages consumes the budget. The fix is to make collection produce clean, AI-ready data so analysis can begin immediately.
Impact data analytics software is a platform category purpose-built to collect, define, clean, and analyze program or portfolio outcome data in one system. It differs from general BI tools (Tableau, Power BI) by supporting mixed-method analysis — quantitative and qualitative together — and from general survey tools (SurveyMonkey, Qualtrics) by supporting longitudinal participant tracking and locked field dictionaries at the collection stage.
Sopact Sense assigns persistent unique IDs at first contact, enforces dictionary definitions at form submission, themes qualitative responses continuously, and connects every response from the same person to a single record. Legacy survey tools treat each survey as a standalone event and require manual reconciliation between instruments; Sopact Sense treats the instruments as one continuous pipeline, so the data is analysis-ready the moment it arrives.
No. Impact funds, foundations, CSR programs, training providers, and social enterprises all use impact data dictionaries. The field names differ by ICP — a nonprofit tracks "participant completion," an impact fund tracks "investee KPI," a foundation tracks "grantee milestone" — but the principle is identical. Lock the definition before collection starts.