A training evaluation question earns its place when it sits inside three
connections: the decision the answer feeds, a paired open-ended item that
explains the rating, and a persistent participant identity that links the
question to the same person's answers across other waves. A question
missing any of those three is what we call an Orphan Question. Most
published question banks, even the well-written ones, are collections of
orphan questions. The fix is structural rather than editorial.
THE ORPHAN QUESTION
A training evaluation question without a decision it feeds, without a
paired open-ended counterpart that explains the rating, and without a
persistent participant identity that links it to the same person across
waves. Question quality cannot rescue an orphaned question. The six
principles below are the architectural decisions that prevent
orphaning at design time.
01 · LEVEL TAG
Tag every question to its Kirkpatrick level
A reaction question and a behavior question are not interchangeable.
Reaction, learning, behavior, results. Each level needs a different
format, a different cadence, a different rubric. A questionnaire
that mixes the four without level tags is a feedback survey
pretending to be an evaluation. The tag tells the analyst which
decision the answer can feed and which it cannot.
Why it matters: A Level 1 satisfaction average cannot
claim Level 4 organizational impact. Mislabeling the level inflates
what the data is asked to prove and breaks credibility when audited.
02 · PAIR RULE
Pair every rating with a paired open-end
The number captures magnitude. The sentence captures the reasoning.
A 1-to-5 confidence rating without an open-ended counterpart is a
number no one can interpret once the cohort closes. Place the
paired open-end immediately after the rating, never at the end of
the survey where dropout peaks and response quality collapses. Pre
training survey questions and post-training survey questions pair
ratings with open-ends symmetrically across both waves.
Why it matters: Solo ratings produce averages that
tell you nothing about why scores moved. The reasoning lives in the
paired open-end. Without it, board questions about cause go unanswered.
03 · LOCK SCALES
Lock scales and wording across every wave
A 1-5 pre and a 1-7 post is not a comparison. It is instrument drift.
Identical wording, identical scale, identical response options.
Pre, post, thirty-day, sixty-day, ninety-day. Lock the instrument
at version 1.0 before Wave 1. A questionnaire designed to evaluate
training reach loses comparability the moment any of those vary.
Why it matters: A mid-program scale change destroys
comparability for that cohort's full history. The delta becomes an
artifact of measurement rather than a measurement of change.
04 · DECISION ANCHOR
Anchor every question to a specific program decision
If you cannot name the decision the question feeds, remove it.
Facilitator review, curriculum cut, barrier intervention, funder-
report line. Every question maps to one of these. Questions
collected in case we need it produce the forty-seven-question
instruments no one analyzes. Decisionless questions are the single
largest source of survey fatigue and shallow questions getting
answered first while hard ones get truncated.
Why it matters: A decision tag stored as metadata
on the question lets unused tags get flagged for removal each
cycle. Surveys shrink toward what changed something, not toward
what was reflexively asked.
05 · PERSISTENT IDENTITY
Assign a persistent participant identity before Wave 1
Email and name fail between waves. A real ID does not.
Email addresses change. Names abbreviate. Participant-remembered
access codes get lost. The identity has to be assigned at
enrollment by the system and embedded in every personalized wave
link. This is the only structural fix for the matching break that
ends pre-and-post analysis three weeks into cleanup. No analyst
process replaces an identity that was never assigned.
Why it matters: Thirty to forty percent of records
fail manual matching by name and email on the first pass. The
identity has to be a primitive, not a reconciliation step.
06 · PLAN FOLLOW-UP
Plan thirty, sixty, ninety-day follow-up waves from day one
Behavior and results live downstream. They cannot be retrofitted.
Level 3 behavior items and Level 4 results items require waves
that run weeks or months after the program closes. Adding them as
afterthoughts six weeks after completion produces fifteen-percent
response rates and no baseline to match against. Commitment to
follow-up is asked for at enrollment, alongside identity capture.
Why it matters: Follow-up bolted on late produces
unusable data. The first conversation with the participant
establishes the cadence the rest of the evaluation depends on.
FOUR FAILURE MODES
Four ways evaluation questions get orphaned
Every principle above prevents one or more of these. Each failure
survives any question-bank template. The fix is architectural, not
editorial.
01
The Smile Sheet
Reaction data presented as evaluation. Satisfaction averaging
4.3 out of 5 and the funder still has no answer to the question
they asked. Level 1 is the floor, not the ceiling.
02
The Unmatched Pair
Pre and post both exist. The identity chain between them does
not. Thirty to forty percent of records fail manual matching
by name and email. No cleanup recovers an identity that was
never assigned.
03
The Orphan Open-End
Open-ended responses sit in a CSV column no one codes. The
reasoning behind every number is collected and then discarded.
Qualitative data that is not themed is effectively not
collected.
04
The Shifting Scale
Pre-survey used 1-to-5. Post-survey used 1-to-7. The delta is
an artifact of instrument drift, not a measurement of change.
Locked scales are a prerequisite, not a refinement.