"40 workshops delivered."
Records that an activity happened. Counts effort. Says nothing about whether anyone's life changed.
Impact data software that reads every record on arrival, locks field definitions against drift, and themes qualitative answers in minutes - not months.
Sopact reads every intake form, survey, and outcome record the day it lands, and checks each field against one locked definition. Most impact data software only stores what arrives — so when a funder asks a comparison question, five teams hand back five numbers that are all correct and all unusable. This page is for the program and evaluation teams who own that data and have to answer for it.
By Unmesh Sheth · Founder & CEO, Sopact · Updated May 25, 2026
Impact data is the structured evidence a program, fund, or organization uses to show that something changed for the people it serves — outputs delivered, outcomes experienced, who was reached, and the qualitative reasons behind the change. Unlike transactional data, which records that an activity happened, impact data is built to answer whether the activity made a difference.
Records that an activity happened. Counts effort. Says nothing about whether anyone's life changed.
Records the change, tied to the same person across a baseline and a follow-up. A funder can act on this.
Every field defined once, every response linked to one participant, every open-ended answer attached. Analyzable the moment it lands.
Most organizations collect too much, not too little. What breaks the archive is the slow slippage in what each field means — across teams, across sites, across years. It is invisible until a funder asks a comparison question, and irreversible once the data is already collected.
"Completion" is added to the intake form. Everyone agrees what it means in the kickoff meeting.
A new site lead reads "completion" as 80% attendance. Program A still records it as a passed final assessment. Nobody flags it.
Three more sites, three more readings. The column is still called "completion" everywhere. The meaning is now five things.
A funder asks to compare cohorts. Staff spend days standardizing history to a new definition — mostly right, never documented.
The comparison can't be defended. Three years of impact data answers the wrong question, or no question at all.
The Definition Drift is the silent, compounding slippage in what each field of impact data actually means across teams, sites, and time. The fix is not better spreadsheet hygiene. It is a locked dictionary enforced at the point of collection — and a workflow that reads every record on arrival, so the fork is caught in week one, not three years later.
Teams shopping for an impact data platform expect the product to solve the drift. Most of the category does not. Underneath the dashboard, a typical tool is a database with a form on the front: it accepts what arrives and holds it. The drift happens inside the store — and a store cannot see it. The difference that matters is not storage. It is whether the software reads.
A database with an intake form. Records land, get filed, and wait. The work of reading them is somebody else's job, on a later calendar.
Sopact reads every record the moment it lands and checks it against the change the program promised. Storage is the floor, not the product.
Impact data passes through collection, definition, and analysis. Drift can enter at any of the three. What keeps the archive whole is a single thread running through all of them: a persistent identifier on every record, and a locked dictionary behind every field.
Surveys, intake forms, interviews, attendance, document uploads. Each one enters linked to the same participant — an ID assigned at first contact, not reconciled by email afterward.
Every field carries one locked meaning, one data type, one validation rule, and an IRIS+ code where the catalog applies — written before a single response arrives.
Outcome shifts, cohort comparisons, and themes from open-ended answers — read from the same source, updating as new records land instead of waiting for a cleanup cycle.
Break the persistent identifier and the longitudinal story is gone — the same person's baseline and follow-up never meet. Break the locked dictionary and the comparison is invalid — two cohorts measured against two definitions. Impact data software earns its name only when both hold from the first record to the last.
An impact data dictionary defines every field your organization collects — its name, its type, its exact meaning, its validation rule, its standard alignment. It is the one document that keeps "completion" meaning the same thing in every program, every year. Six rules separate a dictionary that holds from one that drifts anyway.
Write the exact meaning of "enrollment," "completion," and every outcome field into the dictionary — then enforce it at form submission, not at analysis.
Waiting until the first funder report means the fork has already happened.
Every respondent gets a Persistent Contact ID the moment they enter the program. Every later survey or follow-up lands in the same row — no email-matching, no manual merge.
Retroactive ID assignment loses anyone who changed an address or a name.
Equity analysis is only possible if disaggregation variables are captured at the first touchpoint. Retrofitting them from exports drops a third to half of respondents.
Equity-focused funders treat missing disaggregation as a disqualifying gap.
A rating is the "what." The open-ended answer beside it is the "why." Without both, the dashboard reports that outcomes moved but cannot explain for whom, or why.
The "why" is the part a funder actually acts on.
The GIIN's IRIS+ catalog pre-defines metrics across impact themes. Matching a field to an IRIS+ code satisfies most impact investor reporting with no extra work.
Custom metrics are fine — undocumented custom metrics are not.
Programs evolve and definitions change. An undocumented change creates a permanent comparison gap. Record the date, the reason, and how old data maps to the new meaning.
Informal mid-year edits are the leading cause of unusable archives.
The six rules describe the dictionary. The next step is writing one. The generator below builds a starting impact data dictionary from IRIS+-aligned themes in a few clicks — then you adapt it, lock it, and collect against it.
Select the impact themes your program works in. The generator assembles IRIS+-aligned field definitions you can filter, extend with your own fields, and export. It is a starting point — the real lock happens when the dictionary is enforced at collection.
Pick the themes below. Fields appear in the table, ready to filter and export.
Note on IRIS+ codes: the codes shown are starting suggestions to speed alignment. Confirm each one against the current GIIN IRIS+ catalog before reporting against it. The generated dictionary is a draft — adapt the definitions to your program, then lock them at the point of collection.
Impact data analytics has gotten genuinely good: Claude, Power BI, and Google's analytics stack all turn clean, contextual data into a recommendation now. The bottleneck moved upstream. A model can only analyze impact data that arrives already structured — and most exports fail all three of the conditions below.
Responses from the same person, across any number of instruments, land in the same row. Name-and-email matching is not identifier management — it is a fragile guess that breaks the first time someone changes a job or an address.
A scale field is always the same scale. A date field never holds free text. An enum field rejects values that were not pre-defined. The dictionary is enforced at submission, so the column means one thing in every row.
An open-ended "what did you learn" answer is only analyzable if it carries the respondent ID, the cohort, the demographics, and the program stage. The "why" has to sit on the same record as the "what."
A general analytics tool reads whatever you hand it — including the drift. Sopact produces AI-ready impact data as the default, because the three conditions are enforced when the record arrives, not patched in a cleanup cycle. The export is AI-ready because the ingestion was.
Most teams run impact data through a general survey tool plus a BI dashboard plus a separate qualitative-coding step. Each tool is competent at its slice. The drift lives in the seams between them. Here is the same workflow, compared at the field level.
| Capability | Traditional impact data stack | Sopact |
|---|---|---|
| Collection | ||
| Participant identity | Email or name matching — breaks on typos, address changes, and drop-offs. | A Persistent Contact ID at first contact; every later response lands in the same row. |
| Longitudinal linking | Baseline, mid, and endline merged from separate files by hand each cycle. | Pre, mid, and post resolve on their own through the ID chain. |
| Disaggregation | Retrofitted after collection — drops a third to half of respondents. | Structured at the first form submission; equity questions answerable from day one. |
| Dictionary | ||
| Field definitions | Tribal and undocumented; the meaning of "completion" drifts per team. | A locked dictionary — every change dated, reasoned, and mapped to prior definitions. |
| Validation rules | Enforced by human review; free text lands in numeric fields. | Enforced at submission — an invalid response never enters the archive. |
| IRIS+ alignment | Cross-referenced by hand per field, or skipped entirely. | Mapped as the dictionary is built, across IRIS+-aligned impact themes. |
| Analysis | ||
| Qualitative answers | Coded by hand over weeks — often abandoned when the timeline compresses. | Themed on arrival, linked to each respondent's outcome record. |
| Cohort comparisons | Require a pre-cleanup and a re-merge; validity depends on whether definitions drifted. | Run against the locked dictionary — valid by construction. |
| Reporting cadence | Calendar-bound; the cleanup cycle sets the rhythm, not the decision. | Continuous — the reading happens as records land, not a quarter later. |
Sixty minutes, your own records. We'll read a real reporting cycle live and show where the drift entered.
The drift mechanism is the same everywhere. What changes is the unit of work and the failure the team cannot afford. Each card closes with what impact data software built to read returns.
Workforce, education, and direct-service teams running pre, mid, and post touchpoints across sites — the people who collect the data and own what it says.
A portfolio of grantees, each reporting impact data on its own template. The drift is now between organizations, not just between sites.
Portfolio impact data across investees, rolled up for an LP review. The IRIS+ alignment has to hold from the investee record up to the fund report.
If your impact data cannot answer a funder's comparison question in under a week, the Definition Drift has already set in. The fix is not a bigger store — it is software that reads each record on arrival and holds every field to one definition. The same architecture works for CSR teams, accelerators, and training providers — same locked dictionary, different fields.
Impact data is the structured evidence a program, fund, or organization uses to show that something changed for the people it serves. It includes outputs delivered, outcomes experienced, who was reached, and the qualitative reasons behind the change. The defining feature is the connection: responses must be tied to the same person across touchpoints, or the data cannot answer whether anything actually changed.
Impact data software is a system built to collect, define, and analyze evidence of program or portfolio outcomes in one place. Most products in the category are storage-first — a database with an intake form — and the field definitions drift inside the store. Software built to read, like Sopact, checks every record against a locked dictionary on arrival, so the drift is caught while it is still reversible.
Regular business data records that something happened — a transaction, a login, a form submission. Impact data records whether something changed because of an intervention — an outcome shift, a behavior change, a reported experience. Impact data also requires a persistent identifier, so responses from the same person connect across time. Most general tools do not support that connection by default.
The Definition Drift is the silent, compounding slippage in what each field of impact data means across teams, sites, and time. Two programs can both record "completion" for three years and end up measuring two different things. The drift is invisible until a comparison request arrives, and irreversible once the data is already collected. It is the single most common reason impact data archives become unusable.
An impact data dictionary is a central reference that defines every field your organization collects: its machine name, its display name, its data type, its exact meaning, its validation rule, and its standards alignment. It locks each field so "enrollment" means one thing everywhere, in every program, every year. The dictionary is the durable fix for the Definition Drift — provided it is enforced at collection, not at analysis.
Assign a Persistent Contact ID to every respondent at first contact, lock every field definition in a dictionary before collection begins, structure disaggregation variables at intake rather than retroactively, and ensure qualitative answers carry the same respondent ID as the numeric ones. Sopact reads and enforces all four conditions as records arrive, so the archive stays comparable rather than drifting toward a cleanup cycle.
Impact data management software is a system that governs impact data across its lifecycle — collection, field definitions, validation, versioning, and analysis — rather than leaving each stage to a separate tool. The management part is the dictionary and the identifier chain: the rules that keep a field comparable from the first record to the last. Without that governance layer, "management" is just storage.
Impact data analytics is the structured analysis of program or portfolio evidence to answer three questions: did outcomes change, who experienced the change, and why. It spans quantitative work — rates, shifts, comparisons — and qualitative work — themes, reasons, context. General BI tools handle the quantitative side once data is clean. They do not handle the qualitative side at all, which is where most of the "why" lives.
AI-ready impact data meets three conditions: every row has a stable identifier connecting responses from the same person, every column has a locked type and definition, and every qualitative answer carries the metadata needed to analyze it. Most exports from general tools fail all three tests. Sopact produces AI-ready output as the default state, because the conditions are enforced when the record arrives rather than patched afterward.
The Global Impact Investing Network maintains IRIS+, a catalog of standardized impact metrics organized by theme. For each field in your dictionary, find the closest pre-defined IRIS+ metric and record its code. Alignment lets you benchmark against sector norms and satisfies most impact investor reporting. Always confirm a code against the current IRIS+ catalog before reporting against it, since the catalog is revised over time.
At minimum: a record ID, a participant ID, a collection date, consent status, demographics captured at intake, baseline outcome measures, endline outcome measures, activity or completion fields, and open-ended qualitative fields. Align to IRIS+ codes where the catalog applies. Most nonprofit programs need roughly forty to eighty fields; most impact funds need more. The generator on this page builds a themed starting set.
No. Nonprofits, foundations, impact funds, CSR programs, training providers, and social enterprises all work with impact data. The field names differ by audience — a nonprofit tracks participant completion, a foundation tracks grantee milestones, a fund tracks investee outcomes — but the principle is identical. Lock the definition before collection, and carry one identifier across the record. The architecture does not change.
No. In software engineering, "impact analysis" describes how a change to a database, pipeline, or codebase affects downstream systems. This page is about impact data in the social-impact sense: evidence that a program, grant, or investment changed something for the people it serves. If you came here looking for schema lineage or pipeline dependency tracking, this is a different field with a different toolset.
Sixty minutes with someone who builds impact data systems for a living. Bring a real reporting cycle — the intake forms, the surveys, the open-ended answers. We read it on screen, check the fields against a dictionary, and name where the Definition Drift entered. No slideware, no demo accounts — your data, read live.
No slideware. No demo accounts. Your own records, read live.