play icon for videos
← Academy
Analyze · Text Cleaning

How do you clean open-ended survey responses?

Clean your open-text before you theme it — and keep an audit trail. Here is the prompt that trims, standardises, and buckets responses while preserving every original, then grades the result.

In short: Messy open-text — stray spacing, inconsistent casing, near-duplicates, and ambiguous entries like 'N/A' — distorts any theming you run on it. Clean it first: trim, de-duplicate, standardise spelling and casing, and bucket similar answers, while keeping every original so the change is auditable. Flag anything genuinely ambiguous rather than guessing. One prompt does this and grades the result green, amber, or red.

1 · Set up the assistant over your data

Point the Sopact Sense Assistant at your dataset so it works from clean records with persistent contact IDs, then have it load your Decision Brief first. Keeping the original alongside the cleaned version is what makes the cleaning auditable — you can always show what was changed and why.

You are the Sopact Sense Assistant working over the DEMO-04 · Open-Text Barriers dataset (clean data + persistent contact IDs). Load my Decision Brief (decision, audience, outcomes, indicators, evidence standard) first, then wait for my task.

2 · Write the prompt

Clean the open-text barriers field: trim, de-duplicate, standardise spelling/casing, bucket; keep originals; flag ambiguous. Grade green/amber/red.

Five elements make this prompt work: it runs over your dataset; it cleans + buckets messy text into tidy, comparable values; it keeps originals so every edit is auditable; it flags ambiguous entries instead of forcing a guess; and it ends with a grade of green, amber, or red.

3 · Read what Sense produces

Run on the Open-Text Barriers dataset (DEMO-04) already loaded in Sopact Sense.

GRADE: green | Cleaned, original kept | auditable; amber | Ambiguous 'N/A' | unclear meaning; red | Blank response | empties N

A green entry is cleaned and bucketed with its original preserved, so the edit is auditable. An amber 'N/A' is unclear — it could mean 'no barrier' or a skipped question. A red blank response silently shrinks the denominator and must be flagged as its own category before any theming.

4 · Turn a weak link green

Take the lowest-graded element above and fix it using only what the program could realistically measure. Show the before → after grade and the single indicator/edit that moves it to green.

5 · Make the report and share it

Create a 'missing & incomplete' report from this analysis in Sopact branding [or paste your website URL / brand guideline to apply your own]. List every element graded amber or red, what is missing, and the one input that fixes each. Lead with the decision this report informs.
Create a shareable link for this report and open it in a new tab.

Tricks, tips, and troubleshooting

Always keep the original. Cleaning without an audit trail is just quiet rewriting. Store the raw response next to the cleaned one so any reviewer can check what changed.

Disambiguate 'N/A' before theming. 'N/A' might mean no barrier exists or that the person skipped the question — two very different things. Add one rule that separates true 'no barrier' from a non-response.

Flag blanks as a category. A blank cell silently drops out of the denominator, inflating every prevalence figure. Count blanks explicitly so percentages reconcile to N, not to responders only.

Bucket, don't merge blindly. Standardising 'childcare', 'child care', and 'kids' into one bucket is good; collapsing genuinely different answers to save space loses meaning. Keep buckets defined.

Show me every entry you flagged as ambiguous or blank, and propose the single cleaning rule that would resolve the largest share of them without losing meaning.

Frequently asked questions

How do you clean open-ended survey responses?

Trim whitespace, standardise spelling and casing, de-duplicate near-identical answers, and bucket similar responses into defined groups — all while keeping each original so the cleaning is auditable. Flag genuinely ambiguous entries rather than guessing their meaning.

Why keep the original response after cleaning?

Because cleaning changes the data, and a reviewer or funder may need to verify that meaning was preserved. Storing the raw response next to the cleaned one gives you an audit trail and protects against quiet over-editing.

How should I handle 'N/A' and blank responses?

Treat them separately. An ambiguous 'N/A' needs a rule that distinguishes 'no barrier' from a skipped question, and blanks should be counted as their own category so they don't silently shrink the denominator before you theme the data.

The finished report
A decision-first “missing & incomplete” report — Sopact-branded, shareable in one click.

Ready to try it for yourself?

Open Sopact Sense, paste your program description, and put it to work.

Try in Sopact