play icon for videos

Impact data software: built to read, not just store

Impact data software that reads every record on arrival, locks field definitions against drift, and themes qualitative answers in minutes - not months.

Updated
May 29, 2026
360 feedback training evaluation
Use Case
Impact data software · Built to read, not store

Five programs. Five definitions of done.

Sopact reads every intake form, survey, and outcome record the day it lands, and checks each field against one locked definition. Most impact data software only stores what arrives — so when a funder asks a comparison question, five teams hand back five numbers that are all correct and all unusable. This page is for the program and evaluation teams who own that data and have to answer for it.

On arrival Every record read the day it lands
One ID Pre, mid, post linked per participant
Locked Field definitions enforced at submission
2014 Building for impact data since
Definition

What is impact data?

Plain definition

Impact data is the structured evidence a program, fund, or organization uses to show that something changed for the people it serves — outputs delivered, outcomes experienced, who was reached, and the qualitative reasons behind the change. Unlike transactional data, which records that an activity happened, impact data is built to answer whether the activity made a difference.

Level 1 · A transactional record

"40 workshops delivered."

Records that an activity happened. Counts effort. Says nothing about whether anyone's life changed.

Level 2 · Impact data

"40 graduates; 31 in a training-matched role six months on."

Records the change, tied to the same person across a baseline and a follow-up. A funder can act on this.

Level 3 · AI-ready impact data

The same record — plus a locked definition, a persistent ID, and the why.

Every field defined once, every response linked to one participant, every open-ended answer attached. Analyzable the moment it lands.

Why impact data goes unusable

The failure is rarely missing data. It is drift.

Most organizations collect too much, not too little. What breaks the archive is the slow slippage in what each field means — across teams, across sites, across years. It is invisible until a funder asks a comparison question, and irreversible once the data is already collected.

Cycle 1
One field, one meaning

"Completion" is added to the intake form. Everyone agrees what it means in the kickoff meeting.

Cycle 2
The silent fork

A new site lead reads "completion" as 80% attendance. Program A still records it as a passed final assessment. Nobody flags it.

Cycle 3
The drift compounds

Three more sites, three more readings. The column is still called "completion" everywhere. The meaning is now five things.

Cycle 4
The retroactive patch

A funder asks to compare cohorts. Staff spend days standardizing history to a new definition — mostly right, never documented.

Cycle 5
The archive is unusable

The comparison can't be defended. Three years of impact data answers the wrong question, or no question at all.

Three patterns drive almost every unusable archive: silent definition forks (the same field name, two meanings), retroactive reconciliation (history standardized under deadline, undocumented), and spreadsheet accumulation (by year three the "source of truth" lives in twenty-seven files and agrees in none). None of them is a discipline problem. They are an architecture problem.
The ownable concept · the Definition Drift

The Definition Drift is the silent, compounding slippage in what each field of impact data actually means across teams, sites, and time. The fix is not better spreadsheet hygiene. It is a locked dictionary enforced at the point of collection — and a workflow that reads every record on arrival, so the fork is caught in week one, not three years later.

What impact data software actually does

Most impact data software is a place to put the data.

Teams shopping for an impact data platform expect the product to solve the drift. Most of the category does not. Underneath the dashboard, a typical tool is a database with a form on the front: it accepts what arrives and holds it. The drift happens inside the store — and a store cannot see it. The difference that matters is not storage. It is whether the software reads.

Impact data software built to store

A database with an intake form. Records land, get filed, and wait. The work of reading them is somebody else's job, on a later calendar.

  • Each survey is filed as a standalone event, unlinked to the last one
  • Documents and transcripts are stored, then never opened
  • Field definitions live in someone's head, not in the system
  • Data is exported to a separate tool before anyone analyzes it
  • The drift is discovered at reporting time — after it is permanent

Impact data software built to read

Sopact reads every record the moment it lands and checks it against the change the program promised. Storage is the floor, not the product.

  • Every response links to one participant under a Persistent Contact ID
  • Documents and open-ended answers are read on arrival, not shelved
  • Each field is checked against one locked definition at submission
  • Quantitative and qualitative are read together, on the same record
  • A definition fork is flagged in week one, while it is still reversible
Storing impact data has been a solved problem for twenty years. Reading it — consistently, on arrival, against a definition that holds — is the part the category skipped. That is the job this page is about.
The impact data lifecycle

Three moments — and one thread that has to survive all three.

Impact data passes through collection, definition, and analysis. Drift can enter at any of the three. What keeps the archive whole is a single thread running through all of them: a persistent identifier on every record, and a locked dictionary behind every field.

Moment 01
Collection

Surveys, intake forms, interviews, attendance, document uploads. Each one enters linked to the same participant — an ID assigned at first contact, not reconciled by email afterward.

Risk: three surveys, three spreadsheets, no link
Moment 02
Definition

Every field carries one locked meaning, one data type, one validation rule, and an IRIS+ code where the catalog applies — written before a single response arrives.

Risk: "completion" means five things
Moment 03
Analysis

Outcome shifts, cohort comparisons, and themes from open-ended answers — read from the same source, updating as new records land instead of waiting for a cleanup cycle.

Risk: analysis starts after the decision
The thread

Break the persistent identifier and the longitudinal story is gone — the same person's baseline and follow-up never meet. Break the locked dictionary and the comparison is invalid — two cohorts measured against two definitions. Impact data software earns its name only when both hold from the first record to the last.

The lock against drift

Six rules for an impact data dictionary that holds.

An impact data dictionary defines every field your organization collects — its name, its type, its exact meaning, its validation rule, its standard alignment. It is the one document that keeps "completion" meaning the same thing in every program, every year. Six rules separate a dictionary that holds from one that drifts anyway.

01 · Define first

Lock every field meaning before collection starts

Write the exact meaning of "enrollment," "completion," and every outcome field into the dictionary — then enforce it at form submission, not at analysis.

Why it matters

Waiting until the first funder report means the fork has already happened.

02 · Persistent ID

Assign one identifier at first contact

Every respondent gets a Persistent Contact ID the moment they enter the program. Every later survey or follow-up lands in the same row — no email-matching, no manual merge.

Why it matters

Retroactive ID assignment loses anyone who changed an address or a name.

03 · Disaggregate early

Structure demographics at intake

Equity analysis is only possible if disaggregation variables are captured at the first touchpoint. Retrofitting them from exports drops a third to half of respondents.

Why it matters

Equity-focused funders treat missing disaggregation as a disqualifying gap.

04 · Mixed method

Pair every number with a qualitative "why"

A rating is the "what." The open-ended answer beside it is the "why." Without both, the dashboard reports that outcomes moved but cannot explain for whom, or why.

Why it matters

The "why" is the part a funder actually acts on.

05 · IRIS+ aligned

Map to IRIS+ where the catalog applies

The GIIN's IRIS+ catalog pre-defines metrics across impact themes. Matching a field to an IRIS+ code satisfies most impact investor reporting with no extra work.

Why it matters

Custom metrics are fine — undocumented custom metrics are not.

06 · Version everything

Date and reason every dictionary change

Programs evolve and definitions change. An undocumented change creates a permanent comparison gap. Record the date, the reason, and how old data maps to the new meaning.

Why it matters

Informal mid-year edits are the leading cause of unusable archives.

From rules to a working dictionary

The six rules describe the dictionary. The next step is writing one. The generator below builds a starting impact data dictionary from IRIS+-aligned themes in a few clicks — then you adapt it, lock it, and collect against it.

Try it · Impact data dictionary generator

Build a starting impact data dictionary in a few clicks.

Select the impact themes your program works in. The generator assembles IRIS+-aligned field definitions you can filter, extend with your own fields, and export. It is a starting point — the real lock happens when the dictionary is enforced at collection.

Impact Data Dictionary Generator

Pick the themes below. Fields appear in the table, ready to filter and export.

IRIS+ aligned 8 themes Export-ready
Step 1 · Select impact themes 0 selected
Step 2 · Your generated dictionary
0
Total fields
0
Required
0
IRIS+ codes
0
Categories
Filter by category
Filter by data type
Add a custom field

Note on IRIS+ codes: the codes shown are starting suggestions to speed alignment. Confirm each one against the current GIIN IRIS+ catalog before reporting against it. The generated dictionary is a draft — adapt the definitions to your program, then lock them at the point of collection.

AI-ready impact data

"AI-ready" is three conditions — all met before a model runs.

Impact data analytics has gotten genuinely good: Claude, Power BI, and Google's analytics stack all turn clean, contextual data into a recommendation now. The bottleneck moved upstream. A model can only analyze impact data that arrives already structured — and most exports fail all three of the conditions below.

01
A stable identifier on every row

Responses from the same person, across any number of instruments, land in the same row. Name-and-email matching is not identifier management — it is a fragile guess that breaks the first time someone changes a job or an address.

Without it: the longitudinal story is unreachable
02
A locked type and definition on every column

A scale field is always the same scale. A date field never holds free text. An enum field rejects values that were not pre-defined. The dictionary is enforced at submission, so the column means one thing in every row.

Without it: the comparison is statistically invalid
03
Structured metadata on every qualitative answer

An open-ended "what did you learn" answer is only analyzable if it carries the respondent ID, the cohort, the demographics, and the program stage. The "why" has to sit on the same record as the "what."

Without it: the qualitative half goes unread
Why this is Sopact's job, not the model's

A general analytics tool reads whatever you hand it — including the drift. Sopact produces AI-ready impact data as the default, because the three conditions are enforced when the record arrives, not patched in a cleanup cycle. The export is AI-ready because the ingestion was.

Capability by capability

A traditional impact data stack vs. Sopact.

Most teams run impact data through a general survey tool plus a BI dashboard plus a separate qualitative-coding step. Each tool is competent at its slice. The drift lives in the seams between them. Here is the same workflow, compared at the field level.

Capability Traditional impact data stack Sopact
Collection
Participant identity Email or name matching — breaks on typos, address changes, and drop-offs. A Persistent Contact ID at first contact; every later response lands in the same row.
Longitudinal linking Baseline, mid, and endline merged from separate files by hand each cycle. Pre, mid, and post resolve on their own through the ID chain.
Disaggregation Retrofitted after collection — drops a third to half of respondents. Structured at the first form submission; equity questions answerable from day one.
Dictionary
Field definitions Tribal and undocumented; the meaning of "completion" drifts per team. A locked dictionary — every change dated, reasoned, and mapped to prior definitions.
Validation rules Enforced by human review; free text lands in numeric fields. Enforced at submission — an invalid response never enters the archive.
IRIS+ alignment Cross-referenced by hand per field, or skipped entirely. Mapped as the dictionary is built, across IRIS+-aligned impact themes.
Analysis
Qualitative answers Coded by hand over weeks — often abandoned when the timeline compresses. Themed on arrival, linked to each respondent's outcome record.
Cohort comparisons Require a pre-cleanup and a re-merge; validity depends on whether definitions drifted. Run against the locked dictionary — valid by construction.
Reporting cadence Calendar-bound; the cleanup cycle sets the rhythm, not the decision. Continuous — the reading happens as records land, not a quarter later.
See it on your own data
Bring one field that drifted.

Sixty minutes, your own records. We'll read a real reporting cycle live and show where the drift entered.

Who owns the impact data

Three teams own impact data. Each one answers for it.

The drift mechanism is the same everywhere. What changes is the unit of work and the failure the team cannot afford. Each card closes with what impact data software built to read returns.

Primary · Direct programs

Nonprofit & program teams

Workforce, education, and direct-service teams running pre, mid, and post touchpoints across sites — the people who collect the data and own what it says.

Time
A qualitative coding job that took weeks answered the day the data lands.
Reach
Outcome evidence on every participant — including the ones who went quiet.
Risk
A definition fork caught in week one, before the funder's comparison question makes it permanent.
Foundations

Foundations & grantmakers

A portfolio of grantees, each reporting impact data on its own template. The drift is now between organizations, not just between sites.

Time
Grantee reports read on arrival — program-officer hours back from re-keying spreadsheets.
Money
An audit finding caught before it reaches the board, not after.
Risk
A grantee drifting from its promised outcomes flagged one quarter in, not one year late.
Impact funds

Impact funds & investors

Portfolio impact data across investees, rolled up for an LP review. The IRIS+ alignment has to hold from the investee record up to the fund report.

Time
A portfolio roll-up from one source, instead of a quarterly stitch of investee exports.
Money
One impact claim that does not survive due diligence — avoided before it ships.
Risk
Investee outcome data that holds up in an LP review — every figure traceable to a record.
The diagnostic

If your impact data cannot answer a funder's comparison question in under a week, the Definition Drift has already set in. The fix is not a bigger store — it is software that reads each record on arrival and holds every field to one definition. The same architecture works for CSR teams, accelerators, and training providers — same locked dictionary, different fields.

FAQ

Impact data, answered.

What is impact data?+

Impact data is the structured evidence a program, fund, or organization uses to show that something changed for the people it serves. It includes outputs delivered, outcomes experienced, who was reached, and the qualitative reasons behind the change. The defining feature is the connection: responses must be tied to the same person across touchpoints, or the data cannot answer whether anything actually changed.

What is impact data software?+

Impact data software is a system built to collect, define, and analyze evidence of program or portfolio outcomes in one place. Most products in the category are storage-first — a database with an intake form — and the field definitions drift inside the store. Software built to read, like Sopact, checks every record against a locked dictionary on arrival, so the drift is caught while it is still reversible.

What is the difference between impact data and regular data?+

Regular business data records that something happened — a transaction, a login, a form submission. Impact data records whether something changed because of an intervention — an outcome shift, a behavior change, a reported experience. Impact data also requires a persistent identifier, so responses from the same person connect across time. Most general tools do not support that connection by default.

What is the Definition Drift?+

The Definition Drift is the silent, compounding slippage in what each field of impact data means across teams, sites, and time. Two programs can both record "completion" for three years and end up measuring two different things. The drift is invisible until a comparison request arrives, and irreversible once the data is already collected. It is the single most common reason impact data archives become unusable.

What is an impact data dictionary?+

An impact data dictionary is a central reference that defines every field your organization collects: its machine name, its display name, its data type, its exact meaning, its validation rule, and its standards alignment. It locks each field so "enrollment" means one thing everywhere, in every program, every year. The dictionary is the durable fix for the Definition Drift — provided it is enforced at collection, not at analysis.

How do you collect impact data so it stays usable?+

Assign a Persistent Contact ID to every respondent at first contact, lock every field definition in a dictionary before collection begins, structure disaggregation variables at intake rather than retroactively, and ensure qualitative answers carry the same respondent ID as the numeric ones. Sopact reads and enforces all four conditions as records arrive, so the archive stays comparable rather than drifting toward a cleanup cycle.

What is impact data management software?+

Impact data management software is a system that governs impact data across its lifecycle — collection, field definitions, validation, versioning, and analysis — rather than leaving each stage to a separate tool. The management part is the dictionary and the identifier chain: the rules that keep a field comparable from the first record to the last. Without that governance layer, "management" is just storage.

What is impact data analytics?+

Impact data analytics is the structured analysis of program or portfolio evidence to answer three questions: did outcomes change, who experienced the change, and why. It spans quantitative work — rates, shifts, comparisons — and qualitative work — themes, reasons, context. General BI tools handle the quantitative side once data is clean. They do not handle the qualitative side at all, which is where most of the "why" lives.

What does "AI-ready impact data" mean?+

AI-ready impact data meets three conditions: every row has a stable identifier connecting responses from the same person, every column has a locked type and definition, and every qualitative answer carries the metadata needed to analyze it. Most exports from general tools fail all three tests. Sopact produces AI-ready output as the default state, because the conditions are enforced when the record arrives rather than patched afterward.

How do you align impact data with IRIS+?+

The Global Impact Investing Network maintains IRIS+, a catalog of standardized impact metrics organized by theme. For each field in your dictionary, find the closest pre-defined IRIS+ metric and record its code. Alignment lets you benchmark against sector norms and satisfies most impact investor reporting. Always confirm a code against the current IRIS+ catalog before reporting against it, since the catalog is revised over time.

What fields should an impact data dictionary include?+

At minimum: a record ID, a participant ID, a collection date, consent status, demographics captured at intake, baseline outcome measures, endline outcome measures, activity or completion fields, and open-ended qualitative fields. Align to IRIS+ codes where the catalog applies. Most nonprofit programs need roughly forty to eighty fields; most impact funds need more. The generator on this page builds a themed starting set.

Is impact data software only for nonprofits?+

No. Nonprofits, foundations, impact funds, CSR programs, training providers, and social enterprises all work with impact data. The field names differ by audience — a nonprofit tracks participant completion, a foundation tracks grantee milestones, a fund tracks investee outcomes — but the principle is identical. Lock the definition before collection, and carry one identifier across the record. The architecture does not change.

Is this the same as data-pipeline or database impact analysis?+

No. In software engineering, "impact analysis" describes how a change to a database, pipeline, or codebase affects downstream systems. This page is about impact data in the social-impact sense: evidence that a program, grant, or investment changed something for the people it serves. If you came here looking for schema lineage or pipeline dependency tracking, this is a different field with a different toolset.

Bring your last reporting cycle

We'll read it live and show you the drift.

Sixty minutes with someone who builds impact data systems for a living. Bring a real reporting cycle — the intake forms, the surveys, the open-ended answers. We read it on screen, check the fields against a dictionary, and name where the Definition Drift entered. No slideware, no demo accounts — your data, read live.

No slideware. No demo accounts. Your own records, read live.

Format
Live walkthrough · 60 min
With
Unmesh Sheth · Founder & CEO
Bring
Your last reporting cycle — forms, surveys, and open-ended answers
Leave with
A map of where the drift entered, and a locked dictionary to stop it