play icon for videos

AI for social good: Gen AI vs AI-bolted vs AI-native

AI for social good in plain terms: the Coherence Gap, three AI tiers (Gen AI, AI-bolted, AI-native), and a 4-phase roadmap from spreadsheets to intelligence.

Updated
June 17, 2026
360 feedback training evaluation
Use Case
AI for social good

AI for social good only counts when the claim holds up.

A program director opens ChatGPT on a Tuesday. A funder wants outcome data by Thursday. Ninety seconds later, a report appears. It reads well. Two weeks on, the funder's evaluator asks one follow-up the report can't answer — because the data was never set up to answer it. This page is about closing that gap, and about the three kinds of AI most teams are quietly mixing without knowing which one their claim rests on.

What comes in
Survey answers Interview notes Annual reports Financial documents
Held together

One record per person, one shared set of definitions

Theory of changeIRIS+Five dimensions
What you get
Equity report Board docket Funder update Early warning

Watch — a new series on rethinking your data, workflow, and reporting for AI

In plain terms

Two definitions worth getting straight.

Before the how-to, two quick definitions — the kind you can quote in a board paper or a grant report.

What is AI for social good?

AI for social good means using artificial intelligence on social and environmental problems — better health, less inequality, stronger evidence for programs that serve people. It names the goal. Whether a claim made under it holds up depends on how the data was set up, not on which AI tool was opened on reporting day.

What is the Coherence Gap?

The Coherence Gap is the distance between when information is collected and when it is understood. Everyday AI tools close it for a single report. Bolt-on platforms narrow it. AI-native setups remove it, by treating collection and understanding as one thing, from the first contact with a person.

Tier 1 Gen AI tools
Tier 2 AI-bolted platforms
Tier 3 AI-native systems
The idea to own

The Coherence Gap decides what AI can do for you.

Picture a gap between two points: the moment you collect information, and the moment you make sense of it. The wider that gap, the less any AI can honestly claim. Where your setup sits on this line matters more than which tool you use.

Tier 1 · Gen AI
Information collectedSense made of it
The gap is wide. You paste a spreadsheet into a chat window and get a report-shaped answer. It closes the gap for one document, then opens again.
Tier 2 · AI-bolted
Information collectedSense made of it
The gap narrows. A platform adds AI on top of forms you already run. Helpful, but the understanding still happens after, and downstream of how you collected.
Tier 3 · AI-native
Information collectedSense made of it
The gap is gone. Collection and understanding are designed as one, from the first form. There is no distance to reach across, because the two were never separate.

Read it top to bottom. Where your AI sits decides what it can support. Teams that say "AI doesn't work for our impact reporting" are almost always on the wrong tier for what they're trying to claim. The problem is rarely the model — it's the gap you're asking it to jump.

The three tiers

Three kinds of AI for social good. Which are you actually on?

Most teams mix all three without noticing — drafting with one tool, collecting on another, paying for a third they barely use. The tier that decides how reliable your outcomes are is the tier where your data lives, not the tool you open on reporting day.

Tier 1

Gen AI tools

Understanding happens entirely after collection, on whatever you happen to have. You paste a spreadsheet into a prompt and get text that looks like an impact report.

Good for. Drafts, translations, brainstorming, meeting summaries — anything that doesn't need to be reproduced or held up to a funder.
The catch. Ask twice, get two answers. No trail back to a person.
ChatGPT · Claude · Gemini
Tier 2

AI-bolted platforms

AI added on top of a workflow you already run. It spots patterns in what's been submitted, sums up open answers, flags duplicates. The way you collect is unchanged.

Good for. A single yearly cycle, steady questions, under a couple hundred people, no multi-year tracking.
The catch. The 18-month ceiling — multi-year, multi-funder, equity reporting hits a wall.
Submittable · SurveyMonkey Apply · OpenWater
Tier 3

AI-native systems

Understanding is built into how you collect, from the first contact. Every question is designed as something the report will later need. There is no gap, because collection and analysis were never separate.

Good for. Many groups, many funders, equity breakdowns, year-over-year outcomes — claims that have to survive a careful review.
The catch. It asks you to design collection on purpose — more thought up front than pasting a spreadsheet.
Sopact Sense
The limits of Gen AI

Four structural reasons a ChatGPT impact report cannot defend itself.

Using Claude, ChatGPT, or Gemini to draft impact reports from spreadsheets does not produce impact reports. It produces structured text that resembles them. The distinction matters for four specific structural reasons — and also clarifies the substantial subset of tasks where Gen AI tools are genuinely the right choice.

01

Non-reproducible results

Feed the same dataset to a general-purpose LLM on two different days and you get different thematic interpretations, different narrative framings, sometimes different numbers. Funders and evaluators auditing multi-year programs need outputs they can compare across cycles. Non-deterministic systems cannot provide this by design.

02

No standardized structure

Every LLM session generates its own section architecture. A Year 1 report built in January and a Year 3 report built in March will not share the same section logic, metric display conventions, or comparative framework. Multi-year program evaluation becomes structurally impossible to conduct across reports built this way.

03

Disaggregation inconsistencies

Equity reporting requires breaking outcomes down by gender, location, cohort, and program type. General AI tools handle disaggregation inconsistently across sessions — segment labels shift, definitions vary, portfolio-level comparisons break. For organizations with equity commitments written into funder agreements, this creates compliance risk, not just analytical inconvenience.

04

Weak survey design corrupts everything upstream

Organizations that use AI to help design surveys often discover, two cycles later, that the data cannot be analyzed the way they assumed. The structural problems — no pre-post pairing, no logic model alignment, no field validation — were baked in at collection. This is the failure mode that takes longest to surface and costs the most to fix.

When Gen AI is the right tool

Gen AI is appropriate — and genuinely useful — for tasks that do not require reproducibility or formal attribution. Drafting grant language from bullet points. Translating program descriptions for non-specialist audiences. Brainstorming theory of change language. Summarising meeting notes. The test: would a funder or evaluator see this output and need to rely on it? If yes, Gen AI should not produce it alone. If no, Gen AI is probably the right tool for the job.

How to get to Tier 3

Five things to get right, in order.

Closing the Coherence Gap isn't a single purchase. It's five simple pieces, set up in this order. Get them right and the tools finally help instead of getting in the way.

1 · Data

Count everything people share.

Not just the survey — interviews, reports, financial documents, case notes, applications. That's where most of the story lives, and you can finally read all of it.

2 · Workflow

Map the steps people move through.

Apply, start, mid-point, finish, follow-up. The workflow is the backbone — everything else hangs on it, so it comes first.

3 · Context

Name what you measure, once.

Your theory of change, IRIS+, the five dimensions, or your own framework — set the definitions once and use the same language everywhere. One shared meaning across all your data.

4 · Actions

Decide what happens when answers arrive.

Who to follow up with, what to flag, what to do this month — not next year. Data should lead to a decision, not a folder.

5 · Reporting

Turn it into a report that raises money.

Clear outcomes, broken down by group, traced to real voices, ready the moment a funder asks. The work, finally, tells its own story.

The shift in practice

Same team, same effort. Very different claims.

The move from a wide gap to no gap doesn't take more staff. It takes a different way of letting information in. Here is what changes.

Wide gap · cleaning up after

Collect first, make sense of it later

The same person returns each cycle as a new stranger. Matching them up is done by hand and never finishes.
A funder asks for results by community, and the question was never on the form. Hundreds get re-contacted.
What people wrote sits in a column nobody opens until a consultant gets to it months later.
The headline number lands on a slide with no way back to the people behind it.
No gap · set up to listen

Understand it as it comes in

+Each person is recognised once and stays the same across every form they ever fill.
+The questions a funder will ask are on the form from day one, so a breakdown by group is just a normal view.
+Open answers are read as they land, with the theme tied to the same record as the score.
+Every number can be opened back up to the voices and groups that produced it.
Who it's for

Who AI for social good is built for.

The approach earns its keep where outcomes have to be shown to someone else — a board, a funder, an evaluator. Three groups feel it most, and one honest case where you can wait.

Foundations

Community, family, and corporate funders

Especially those adding impact funds or venture philanthropy. One record per grantee, every report a view of the same evidence, and equity breakdowns that are a byproduct of normal collection.

Impact funds

Funds and ESG portfolios

Dozens of investees, quarterly updates that never line up. A shared set of definitions turns scattered documents into a portfolio view you can take to an LP.

Nonprofits

Direct-service and program teams

Programs walking alongside people over time. The work shifts from logging activity to improving outcomes — and the annual report stops being a three-week scramble.

When you don't need this yet

Honesty matters here. If you run a single yearly program, under a couple hundred people, with steady questions and no multi-year or multi-funder reporting, a well-built form plus a spreadsheet is genuinely enough. Come back when you add a second cohort, a second funder, or an equity requirement — that's the point where the gap starts to cost you.

A quick self-check

Signs your setup has closed the gap.

You don't need an audit to know which tier you're on. If most of these are true, your gap is small. If most aren't, that's where to start.

The same person is recognised across every form, automatically.
The fields a funder asks about were collected from the start.
Open answers are read as they arrive, not months later.
Every number in a report can be traced back to real responses.
You can answer a results-by-group question from 18 months ago without rebuilding spreadsheets.
Run the same analysis twice and you get the same answer.
Common questions

AI for social good, answered.

What is AI for social good?

It means using artificial intelligence on social and environmental problems — better health, less inequality, stronger evidence for programs that serve people. It names the goal. Whether a claim made under it holds up depends on how the data was set up, not on which AI tool was used.

How is AI for social good different from AI for social impact?

AI for social good is the wider lens — the intent to do good with these tools. AI for social impact is the working discipline inside it: measuring and improving what happens to specific people. Social good describes intent; social impact describes accountability.

What is the Coherence Gap?

The distance between when you collect information and when you understand it. Everyday AI tools close it for one report; bolt-on platforms narrow it; AI-native setups remove it by designing collection and understanding as one. Where your setup sits decides what your AI can honestly claim.

What are the three AI tiers?

Tier 1 is Gen AI tools (ChatGPT, Claude, Gemini) used after the fact. Tier 2 is AI bolted onto an existing workflow. Tier 3 is AI-native, where collection and analysis are built as one. The tier that governs your reliability is the one where your data lives, not the tool you open on reporting day.

Can ChatGPT write my impact report?

For a rough draft, yes. For a report a funder relies on, no. It gives a different answer each time, a different structure each time, and cannot trace a number back to a person. Use it to draft language, not to produce the evidence.

When is Gen AI the right tool?

For anything that doesn't need to be reproduced or relied on — drafting grant language, translating for a general audience, brainstorming theory of change wording, summarising notes. The test: would a funder or evaluator need to rely on this output? If yes, Gen AI shouldn't produce it alone.

What are the five elements of an AI-ready setup?

Data (everything people share, not just surveys), workflow (the steps people move through), context (one shared set of definitions like a theory of change or IRIS+), actions (what happens when answers arrive), and outcome-ready reporting (results traced to real voices). Get them right in that order.

Which tier do I need for my program?

A single annual cycle under a couple hundred people, with steady questions, is fine on Tier 1 or 2. Multi-year, multi-funder, equity-disaggregated outcome tracking needs Tier 3, where collection and analysis are one system. Match the tier to the claim you have to defend.

Do I have to replace all my current tools?

No. You rethink one workflow first — how information comes in, and how the same person is recognised across it. The right tools follow from that. Start with the five elements, in order, on a single program.

Is this only for large foundations?

No. A single program with one clear set of questions is enough to begin, and the approach is useful on day one. It scales up to networks, funds, and portfolios, but it doesn't require that scale to earn its keep.

Find your tier

See which tier your claim rests on.

Bring a recent impact report and the data behind it. We'll show you where your Coherence Gap sits, which of the three tiers you're actually on, and what a Tier 3 setup would change about the claims you can defend. No slides, no demo accounts.

Your own records, read live — not a generic demo.