play icon for videos

Training Evaluation Survey Questions: Examples by Kirkpatrick Level

Training evaluation survey questions for every Kirkpatrick level. Pre and post examples, behavior-anchored prompts, and the question architecture funders accept.

US
Pioneering the best AI-native application & portfolio intelligence platform
Updated
May 6, 2026
360 feedback training evaluation
Use Case
QUESTION DESIGN · KIRKPATRICK LEVELS 1 THROUGH 4
A training evaluation runs four levels. Each level needs its own questions. Most evaluations only ship questions at Level 1.

A questionnaire labelled training evaluation usually contains end-of-session reaction items: how relevant was the content, how was the pace, how clear were the materials. Reaction questions are Level 1. The evaluation framework has three more levels above it. Each one calls for a different question shape, a different cadence, and a different decision tag. This page is the question bank: thirty examples across the four Kirkpatrick levels, the paired pre and post pairing rule, and the persistent-identity discipline that lets the levels connect on the same participant. Workforce training, clinical training, and pharma sales enablement examples throughout.

On this page
The four-level question pathway
Question types by level, defined
Six rules for writing evaluation items
Choices that decide if a level lands
A worked example: 30 evaluation questions
Three program contexts compared
THE FOUR-LEVEL PATHWAY

Each evaluation level needs its own question shape

Kirkpatrick's four levels are not optional layers. They build on each other. A Level 4 result requires Level 3 application, which requires Level 2 learning, which the Level 1 reaction influences. Each level needs questions written for that level, asked at the cadence that level requires.

Levels of training evaluation
01
Reaction
How participants experienced the session.
"How relevant was today's content?"
Likert · paired open-ended · End of session
02
Learning
What knowledge or skill changed.
"What is the first action you would take if a participant disclosed unsafe housing?"
Scenario · Pre and post · Same scoring rubric
03
Behavior
What got applied on the job.
"In the past thirty days, how many times did you use the intake protocol?"
Anchored count · 30/60 days post · Manager observation pair
04
Results
Whether applied behavior moved an outcome.
"What is the cohort's six-month placement rate compared to prior cohorts?"
Tied KPI · 90 days to 12 months post · Operational data
Each level requires the level above to land
If reaction is poor, knowledge rarely lands.
If learning does not change, behavior cannot follow.
If behavior does not get applied, results do not move.
If results never move, training cost has no return.

A questionnaire that only asks Level 1 reaction questions tests the first assumption and leaves the next three unmeasured. The questions for the remaining three levels are different in format, different in cadence, and run on the same participant identity to keep the chain connected.

The four-level pathway, originally Donald Kirkpatrick (1959), expanded with the New World Kirkpatrick Model (Kirkpatrick & Kirkpatrick, 2016). The participant identity discipline is the thread that lets the levels connect on the same person across instruments and time.

QUESTION TYPES, DEFINED

Four kinds of questions, one connected questionnaire

Every training evaluation question belongs at one of four Kirkpatrick levels. The level the question sits at determines the format, the cadence, and the decision it can feed. The four definitions below cover what each kind of question is and how it differs from the others.

What is a reaction question (Kirkpatrick Level 1)?

A reaction question is asked at the end of a training session and scores how the participant experienced it: relevance, clarity, pace, application moment. Most are Likert-formatted on a one-to-five or one-to-seven scale, with one or two paired open-ended prompts asking what produced the rating.

Reaction questions are the easiest level to ask and the easiest to misuse. They reveal whether the session felt useful, not whether anyone learned anything. A questionnaire that only contains reaction questions is a feedback survey labelled as evaluation. The worked example on this page contains six reaction items per end-of-session form.

What is a learning question (Kirkpatrick Level 2)?

A learning question scores knowledge or skill change. The format that produces a real measure is paired pre and post: the same scenario or rubric-scored item asked before training and again at the end, on the same participant identity. The post score alone is not a Level 2 measure. The delta per person is.

Scenario items beat recall items. A one-paragraph case followed by an action question, scored against a published rubric, captures what the participant actually understood. Recall items capture what was memorized. The Level 2 question bank in the worked example uses six scenario items, scored on a four-point rubric, asked twice.

What is a behavior question (Kirkpatrick Level 3)?

A behavior question asks whether the learning was applied on the job. It runs thirty, sixty, or ninety days after training, never at the end of training itself. Effective Level 3 items are anchored to specific moments: in the past thirty days, how many times did you use the intake protocol from training. Anchored counts beat self-rated frequency scales because the moment is concrete.

The application moment named in the end-of-training reaction question seeds the behavior question. The participant remembers what they committed to apply. Self-report items pair with manager observation when the program has manager visibility into job tasks. The Level 3 bank in the worked example uses five anchored-count items plus one open-ended prompt about barriers.

What is a results question (Kirkpatrick Level 4)?

A results question scores whether trained behavior moved a downstream outcome. It is tied to an existing operational metric, not a new survey indicator. For a workforce training program: placement rate, retention at six months, employer satisfaction. For a clinical training: case-resolution time, patient outcome scores. For a sales training: conversion rate, deal size, retention.

Level 4 questions are usually pulled from operational systems three to twelve months after training, not asked in a survey. The "question" is a metric definition, a date range, and a comparison cohort. The worked example pairs five outcome metrics to the ninety-day cohort report.

Common question-type confusions

RECALL VS SCENARIO
A recall item asks the participant to repeat a fact ("what are the four steps of the intake protocol?"). A scenario item asks them to apply it to a case. Recall measures memorization; scenario measures understanding. Level 2 belongs to scenario.
SELF-REPORT VS OBSERVATION
Behavior questions can ask the participant ("how many times did you use the protocol?") or ask their manager ("how often have you observed protocol use?"). Self-report and observation often disagree. Where both are possible, run both and report the gap.
PROXY VS TIED METRIC
A results question can ask a survey proxy ("rate your team's performance") or pull a tied metric (placement rate, conversion rate, prescription lift). Tied metrics beat survey proxies for Level 4 because they exist in operational systems and were defined before training.
SNAPSHOT VS PAIRED
A snapshot question asks once. A paired question asks twice on the same participant, with persistent identity. Level 1 reaction is a snapshot; Level 2 learning is paired. Asking a Level 2 item once and reporting the post score is a measurement error, not a Level 2 score.
SIX PRINCIPLES

How to write training evaluation questions that do not orphan themselves

A training evaluation question earns its place when it sits inside three connections: the decision the answer feeds, a paired open-ended item that explains the rating, and a persistent participant identity that links the question to the same person's answers across other waves. A question missing any of those three is what we call an Orphan Question. Most published question banks, even the well-written ones, are collections of orphan questions. The fix is structural rather than editorial.

THE ORPHAN QUESTION
A training evaluation question without a decision it feeds, without a paired open-ended counterpart that explains the rating, and without a persistent participant identity that links it to the same person across waves. Question quality cannot rescue an orphaned question. The six principles below are the architectural decisions that prevent orphaning at design time.
01 · LEVEL TAG

Tag every question to its Kirkpatrick level

A reaction question and a behavior question are not interchangeable.

Reaction, learning, behavior, results. Each level needs a different format, a different cadence, a different rubric. A questionnaire that mixes the four without level tags is a feedback survey pretending to be an evaluation. The tag tells the analyst which decision the answer can feed and which it cannot.

Why it matters: A Level 1 satisfaction average cannot claim Level 4 organizational impact. Mislabeling the level inflates what the data is asked to prove and breaks credibility when audited.
02 · PAIR RULE

Pair every rating with a paired open-end

The number captures magnitude. The sentence captures the reasoning.

A 1-to-5 confidence rating without an open-ended counterpart is a number no one can interpret once the cohort closes. Place the paired open-end immediately after the rating, never at the end of the survey where dropout peaks and response quality collapses. Pre training survey questions and post-training survey questions pair ratings with open-ends symmetrically across both waves.

Why it matters: Solo ratings produce averages that tell you nothing about why scores moved. The reasoning lives in the paired open-end. Without it, board questions about cause go unanswered.
03 · LOCK SCALES

Lock scales and wording across every wave

A 1-5 pre and a 1-7 post is not a comparison. It is instrument drift.

Identical wording, identical scale, identical response options. Pre, post, thirty-day, sixty-day, ninety-day. Lock the instrument at version 1.0 before Wave 1. A questionnaire designed to evaluate training reach loses comparability the moment any of those vary.

Why it matters: A mid-program scale change destroys comparability for that cohort's full history. The delta becomes an artifact of measurement rather than a measurement of change.
04 · DECISION ANCHOR

Anchor every question to a specific program decision

If you cannot name the decision the question feeds, remove it.

Facilitator review, curriculum cut, barrier intervention, funder- report line. Every question maps to one of these. Questions collected in case we need it produce the forty-seven-question instruments no one analyzes. Decisionless questions are the single largest source of survey fatigue and shallow questions getting answered first while hard ones get truncated.

Why it matters: A decision tag stored as metadata on the question lets unused tags get flagged for removal each cycle. Surveys shrink toward what changed something, not toward what was reflexively asked.
05 · PERSISTENT IDENTITY

Assign a persistent participant identity before Wave 1

Email and name fail between waves. A real ID does not.

Email addresses change. Names abbreviate. Participant-remembered access codes get lost. The identity has to be assigned at enrollment by the system and embedded in every personalized wave link. This is the only structural fix for the matching break that ends pre-and-post analysis three weeks into cleanup. No analyst process replaces an identity that was never assigned.

Why it matters: Thirty to forty percent of records fail manual matching by name and email on the first pass. The identity has to be a primitive, not a reconciliation step.
06 · PLAN FOLLOW-UP

Plan thirty, sixty, ninety-day follow-up waves from day one

Behavior and results live downstream. They cannot be retrofitted.

Level 3 behavior items and Level 4 results items require waves that run weeks or months after the program closes. Adding them as afterthoughts six weeks after completion produces fifteen-percent response rates and no baseline to match against. Commitment to follow-up is asked for at enrollment, alongside identity capture.

Why it matters: Follow-up bolted on late produces unusable data. The first conversation with the participant establishes the cadence the rest of the evaluation depends on.
FOUR FAILURE MODES
Four ways evaluation questions get orphaned

Every principle above prevents one or more of these. Each failure survives any question-bank template. The fix is architectural, not editorial.

01
The Smile Sheet

Reaction data presented as evaluation. Satisfaction averaging 4.3 out of 5 and the funder still has no answer to the question they asked. Level 1 is the floor, not the ceiling.

02
The Unmatched Pair

Pre and post both exist. The identity chain between them does not. Thirty to forty percent of records fail manual matching by name and email. No cleanup recovers an identity that was never assigned.

03
The Orphan Open-End

Open-ended responses sit in a CSV column no one codes. The reasoning behind every number is collected and then discarded. Qualitative data that is not themed is effectively not collected.

04
The Shifting Scale

Pre-survey used 1-to-5. Post-survey used 1-to-7. The delta is an artifact of instrument drift, not a measurement of change. Locked scales are a prerequisite, not a refinement.

CHOICES THAT DECIDE IF AN EVALUATION LANDS

Six question-design decisions, side by side

Six choices come up every time someone writes a training evaluation questionnaire. The first column names the decision. The middle columns show the broken default and the working alternative. The last column names the consequence: what each choice decides about the data the questionnaire produces.

The choice
Broken way
Working way
What this decides
Question scope
Single moment vs paired pre and post.
BROKEN
Asked once at end of training. Reported as a learning score.
WORKING
Asked at intake and end of training, on the same participant. Delta is the score.
Whether change is measurable at all. Without the pairing, only end-state position is visible.
Knowledge format
Recall items vs scenario items.
BROKEN
"List the four steps of the intake protocol." Tests memorization. Re-grades after the participant looks it up.
WORKING
Case paragraph followed by an action question, scored on a four-point rubric. Tests applied understanding.
Whether Level 2 reflects understanding or recall. Most published evaluations use recall and call it learning.
Behavior framing
Self-rated frequency vs anchored count.
BROKEN
"How often do you use the protocol?" with a 1-5 frequency scale. Personality scales the rating.
WORKING
"In the past thirty days, how many times did you use the protocol?" Anchored count, paired with manager observation when possible.
Whether Level 3 produces comparable counts. Frequency scales drift between participants; counts do not.
Results indicator
Survey proxy vs tied operational metric.
BROKEN
"Rate your team's performance after training" on a 1-5 scale. Survey proxy, scaled by who answered.
WORKING
Pull placement rate, retention at six months, or operational metric defined before training. Compare to prior cohort.
Whether Level 4 reflects outcomes or perceptions. Tied metrics can be audited; survey proxies cannot.
Identity discipline
Anonymous waves vs persistent identifier.
BROKEN
Each instrument anonymous. Pre-test, post-test, follow-up all collected separately. Levels reported as group averages.
WORKING
Single participant ID issued at intake, carried through every wave. One connected record per person across four levels.
Whether the four levels chain or stay disconnected. The chain is the point of running four levels.
Reporting cadence
One-shot end of program vs staged across time.
BROKEN
All four levels collected at end-of-program. Behavior asked before behavior could occur. Results asked before results exist.
WORKING
Reaction at end of session, learning at end of program, behavior at thirty and sixty days, results at ninety days to twelve months.
Whether each level can actually exist when measured. Behavior takes thirty days minimum to form.
COMPOUNDING EFFECT

The first decision controls all the others. If you anonymize each wave, the pairing rule cannot fire, the anchored-moment rule cannot connect to a known learning baseline, and tied metrics cannot match a person to an outcome. Identity discipline is the spine that lets the rest of the matrix work.

A WORKED EXAMPLE · 30 QUESTIONS · WORKFORCE TRAINING

Thirty training evaluation questions for an 8-week workforce cohort

A 240-participant workforce training program. Eight weeks. Outcomes: case-management certification and job placement. Below: the scenario, the instrument set, and thirty specific evaluation questions distributed across the four Kirkpatrick levels with the cadence each level requires.

We had a feedback survey running for three years. End-of-program scores stayed in the high fours. Job placement rates barely moved. The funder asked whether the training was actually working. The honest answer was that we had no idea, because we had only ever asked Level 1 reaction questions. The questionnaire we now run spans all four levels and it has changed which curriculum modules we keep.

Workforce training program lead, post-rebuild cycle.
The instrument set, level by level

Five instruments across the cohort lifecycle. One participant identifier carries across all five. Levels chain because the same person fills every form.

A · INTAKE
Pre-training baseline
Six scenario items at intake, scored on a four-point rubric. Establishes the baseline for Level 2 pairing.
B · END OF SESSION
Reaction (Level 1)
Six Likert items plus two paired open-ended prompts. Run weekly across eight sessions. Decisions feed cohort-mid adjustments.
C · END OF PROGRAM
Learning (Level 2)
Same six scenarios as intake, plus two open-ended application prompts. Delta per participant scores the cohort.
D · 30/60 DAYS
Behavior (Level 3)
Five anchored-count items plus one barriers prompt. Run twice. Manager observation paired in where possible.
E · 90 DAYS+
Results (Level 4)
Five operational metrics pulled from program management system and employer follow-up. Compared to prior cohorts.
The thirty questions

Each question lists the format, the scoring approach, and the decision the answer feeds. Items are written for case-management certification training, but the structure transfers to clinical, sales, and professional development training.

LEVEL 1 · REACTION · END OF EACH SESSION · 6 ITEMS
L1·01
"On a scale of one to five, how relevant was today's content to a case you are working on or expect to work on?"
Likert · Decision: cohort-mid relevance review
L1·02
"How clear was the protocol example in the second hour of today's session?"
Likert · Decision: facilitator content adjustment for next cohort
L1·03
"How well did today's pace match the depth of the material?"
Likert · Decision: session length and break cadence
L1·04
"How confident do you feel applying today's protocol on a real case in the next week?"
Likert · Decision: which content needs reinforcement before next session
L1·05
"What is the one moment from today's session you would point to as clearest?"
Open-ended · Paired with L1·02 · Decision: facilitator feedback loop
L1·06
"What is the one moment from today's session you would point to as least clear?"
Open-ended · Paired with L1·02 · Decision: curriculum revision queue
LEVEL 2 · LEARNING · INTAKE AND END OF PROGRAM · 8 ITEMS · PAIRED
L2·01
Scenario: "A participant arrives at intake disclosing unsafe housing. What is the first action you would take and why?" Scored on a four-point protocol-fidelity rubric.
Scenario · Pre and post · Decision: housing protocol reinforcement
L2·02
Scenario: "A case shows evidence of substance use disclosed mid-session. Walk through your next two steps." Rubric-scored.
Scenario · Pre and post · Decision: substance-use module revision
L2·03
Scenario: "A participant presents with documentation gaps. List the three records you would request and the order." Rubric-scored.
Scenario · Pre and post · Decision: documentation training depth
L2·04
Scenario: "A case raises a child-welfare concern. What is your mandated-reporter obligation and timeline?" Rubric-scored.
Scenario · Pre and post · Decision: compliance training audit
L2·05
Scenario: "A participant disagrees with the proposed plan. Describe your three-step de-escalation approach." Rubric-scored.
Scenario · Pre and post · Decision: de-escalation curriculum hours
L2·06
Scenario: "A case requires referral. Name the two systems you would consult to identify the right partner agency." Rubric-scored.
Scenario · Pre and post · Decision: referral-system training time
L2·07
"Pick one of the six scenarios above. In four sentences, describe a real case you expect to encounter where this protocol applies."
Open-ended · Post only · Decision: applied-context relevance check
L2·08
"Which of the six protocols feels least practiced and what would help?"
Open-ended · Post only · Decision: post-program coaching priorities
LEVEL 3 · BEHAVIOR · 30 AND 60 DAYS POST · 6 ITEMS · ANCHORED
L3·01
"In the past thirty days, how many times did you use the housing-disclosure protocol from training?"
Anchored count · Self-report + manager observation · Decision: housing protocol coaching
L3·02
"In the past thirty days, how many cases involved a substance-use disclosure, and on how many did you apply the trained next-step sequence?"
Anchored count · Decision: substance-use refresher scheduling
L3·03
"In the past thirty days, how many times did you request the documentation set covered in week three?"
Anchored count · Decision: documentation workflow integration
L3·04
"In the past thirty days, did any case meet the mandated-reporter threshold? If yes, how many, and on how many did you complete the report within the trained timeline?"
Anchored count + count · Decision: compliance audit follow-up
L3·05
"In the past thirty days, how many times did you use the three-step de-escalation approach from training?"
Anchored count · Manager observation paired · Decision: de-escalation refresher
L3·06
"What barriers, if any, prevented you from applying the trained protocols in the past thirty days?"
Open-ended · Decision: organizational barrier review
LEVEL 4 · RESULTS · 90 DAYS TO 12 MONTHS POST · 5 METRICS · TIED
L4·01
Cohort certification pass rate vs prior cohort. Pulled from certifying body records.
Tied operational metric · 90 days · Decision: curriculum continuation
L4·02
Job placement rate at 6 months post-program. Pulled from program management system.
Tied operational metric · 6 months · Decision: funder reporting + program continuation
L4·03
Job retention at 12 months for placed participants. Pulled from employer follow-up.
Tied operational metric · 12 months · Decision: program design vs partner employer mix
L4·04
Employer satisfaction score for placed participants. Pulled from employer survey at 6 months.
Tied operational metric · 6 months · Decision: which protocol modules need depth
L4·05
Compliance violations (mandated-reporter timeline) per 100 cases handled. Pulled from compliance audit.
Tied operational metric · Quarterly · Decision: compliance training cadence
Sopact Sense produces
One connected record per participant
Intake baseline, end-of-session reactions, post-program learning, 30/60-day behavior, and 90-day results all join on the same identifier without manual matching.
Pre-post pairing computed automatically
Level 2 deltas land per participant the moment the post wave closes. The cohort score is the average of paired deltas, not a comparison of two anonymous group averages.
Open-ended responses coded against rubrics
Level 2 scenario answers and Level 3 barriers prompts get coded by rubric without an analyst manually reading every response. The codes feed the same record.
Level 4 metrics imported from operational systems
Placement, retention, and compliance metrics flow in from the program management system. The four levels report side by side per cohort.
Why traditional questionnaire tools fail
Each instrument is a separate form
Pre, post, 30-day, 60-day, and 90-day forms are five disconnected surveys. Matching by hand turns into a multi-week analyst project per cohort.
Anonymous waves break the pairing rule
Default identity settings produce four group averages. Level 2 turns into "average pre vs average post" instead of paired delta per person.
Rubric scoring is manual
Six scenario items times 240 participants times two waves equals 2,880 rubric judgments per cohort. Most evaluations skip the rubric and score on completion only.
Results metrics live in another system
Placement and retention sit in the program management system. The questionnaire tool can show the survey data but cannot join it to the operational metric without a custom integration.

The thirty questions above are written once and run as a connected instrument set for every cohort. The pairing, the manager-observation join, the rubric coding, and the operational-metric pull are structural properties of the platform, not procedural steps an analyst repeats.

THREE ARCHETYPES

Three program shapes, one question architecture

Workforce programs, capacity-building cohorts, and credential programs each carry different content and different funder questions. They all need the same three-wave question architecture to produce evidence. The question bank changes. Persistent participant identity, locked scales, and paired open-ends do not.

01

Workforce program

Nonprofit running a 12-week job-readiness cohort, sixty participants.

Funder requires pre-to-post evidence disaggregated by gender and site, plus ninety-day post-exit employment data. The question bank stays identical across cohorts. The architecture is what determines whether funder reports take three days or four minutes.

01
Pre-survey
Baseline confidence, barrier identification, site and gender anchors
02
Post-survey
Matched confidence ratings, scenario MCQs, reaction items
03
90-day follow-up
Employment outcome, sustained confidence, barrier themes
TRADITIONAL STACK
Three tools, three identity systems, three weeks of matching

Google Forms for intake with no identity beyond row number. SurveyMonkey for post with separate login and separate export. Generic email for the ninety-day with responses untagged to participant. Analyst reconciles on name and email in a spreadsheet. Thirty to forty percent of records fail matching on the first pass.

WITH SOPACT SENSE
One platform, one participant identity, no manual matching

Persistent participant identity assigned at enrollment. Every wave delivered through a personalized link tied to that identity. Open-ended responses themed as they arrive. Disaggregation by gender and site is a filter rather than a project. Funder report generates in minutes.

02

Capacity-building cohort

Nonprofit training other nonprofits, six-month cohort, forty partner organizations.

The participant is both an individual and an institutional representative. Funder asks whether the practice was adopted at the organizational level, not only whether the individual learned it. The question bank adds organizational-adoption items at ninety days and six months, alongside the individual-level items at end-of-program.

01
Pre-survey
Current organizational practice, individual baseline confidence
02
Post-survey
Learning confirmation, intent to adopt at organizational level
03
6-month follow-up
Organizational adoption evidence, barrier themes, multiplier effect
TRADITIONAL STACK
Individual-level data that cannot answer organization-level questions

Each participant answered separately with no institutional link. When the individual leaves the partner organization, the record is lost. Adoption evidence reduces to anecdote from annual grantee reports. Funder asks which organizations adopted and the answer requires re-surveying the cohort eighteen months later.

WITH SOPACT SENSE
Participant identity plus partner organization identity, both persistent

Every participant tagged to the partner organization at enrollment. Organizational adoption is a six-month rubric- based survey on the same connected record. Open-ended responses themed for barrier and enabling-condition patterns. Which organizations adopted becomes a report filter, not a research project.

03

Credential program

Quarterly cohorts of one hundred twenty participants, external rubric, employer verification.

The credential holds value only if post-program rubric scores can be traced to the same participant's learning trajectory. Rubric scoring is the Level 2 anchor. Employment outcome at ninety days is the Level 4 proof. Cross-cohort comparison requires the rubric and the question bank to stay locked across quarters.

01
Pre-survey
Baseline rubric self-assessment, prior credentials
02
Assessor rubric
External assessor scoring on the same rubric scale
03
90-day + 12-month
Credential use on the job, employer verification
TRADITIONAL STACK
Rubric in a separate system from the participant survey

Rubric scored in a spreadsheet and the participant survey in a form tool. Matching rubric to survey requires assessor- provided identifiers. No cross-year comparison because rubric versions drift. Employer verification arrives as a PDF attachment that never gets linked back to the participant record.

WITH SOPACT SENSE
Rubric and participant survey share one connected record

Assessor opens the participant record and scores the rubric directly. Rubric version locked across cohorts and year- over-year comparison holds. Employer verification captured through a link to the same record. Credential value is auditable because every score traces back to a participant and an assessor.

Three program shapes, one question architecture. The question bank changes by program type. Persistent participant identity, locked scales, and paired open-ends do not.

FREQUENTLY ASKED

Training evaluation survey questions

Q.01
What is a training evaluation question?

A training evaluation question scores a training outcome at one of the four Kirkpatrick levels. Reaction questions score how the session felt. Learning questions score what knowledge changed. Behavior questions score what got applied on the job. Results questions score whether trained behavior moved a downstream outcome. The level the question maps to determines the format, the timing, and the decision the answer can feed.

Q.02
What is a training evaluation questionnaire?

A training evaluation questionnaire is a set of evaluation questions organized by the level each one measures. A real questionnaire spans more than one level. The end-of-program form covers reaction and learning. A separate thirty-day or sixty-day form covers behavior. A quarterly or annual report covers results, often pulled from existing operational data rather than a survey. A questionnaire with only Level 1 reaction items is a feedback survey under a different name.

Q.03
What are training evaluation questions examples?

Examples by level: Reaction, on a one-to-five scale, how relevant was today's session to your current work. Learning, before training, what is the first action you would take if a participant disclosed unsafe housing, and the same question asked after training. Behavior, in the past thirty days, how many times did you apply the intake protocol from training. Results, what is your three-month case-resolution rate compared to last quarter. The worked example on this page lists thirty specific questions organized into four Kirkpatrick levels.

Q.04
What are sample questions for evaluation of training?

Sample questions span four levels. Reaction Likert plus paired open-ended at the end of every session. Learning paired pre and post on the same knowledge items, run before and after training, with persistent participant identity so the delta computes per person. Behavior at thirty and sixty days, anchored to specific job moments. Results pulled from existing operational data three to twelve months post-training. Question wording for each level is on this page; the worked example contains thirty samples.

Q.05
What is the difference between training evaluation and training feedback?

Training feedback is one of the four levels of training evaluation. Feedback covers reaction, what Kirkpatrick calls Level 1: did the session feel relevant, was the pace right, were materials clear. Training evaluation is the larger frame and includes learning, behavior, and results in addition to reaction. Most surveys labeled training evaluation only ask reaction questions. A complete evaluation has a different question set for each level and runs at the cadence each level requires.

Q.06
How do I write a training evaluation questionnaire?

Decide which Kirkpatrick levels you will measure. Write reaction items only after you have written the learning, behavior, and results items the higher levels need. Pair every Level 2 learning item across pre and post. Anchor every Level 3 behavior item to a specific application moment. Tie every Level 4 results item to an operational metric you already track. Keep persistent participant identity across all four waves so the levels connect on the same person. Total instrument set runs four to six pieces, not one form.

Q.07
What are the four levels of Kirkpatrick training evaluation?

Level 1, reaction: how participants felt about the training. Level 2, learning: what knowledge or skill changed. Level 3, behavior: whether the learning was applied on the job. Level 4, results: whether the applied behavior moved a downstream outcome. Levels build on each other: a strong Level 4 result requires Level 3 application, which requires Level 2 learning. Most training evaluation questionnaires only cover Level 1, which is why the framework feels truncated when published reports try to claim impact from feedback alone.

Q.08
What are pre and post training evaluation questions?

Pre and post training evaluation questions are paired knowledge or confidence items run before training begins and after training ends, on the same participants. The pairing is the measurement: the post score alone says nothing about change unless the pre score is on file for the same person. Same wording, same scale, same scenarios, asked twice. Persistent participant identity is the discipline that makes the pairing work. Without it, you are comparing two anonymous group averages, which is not a Level 2 measure.

Q.09
What are level 3 evaluation questions (behavior)?

Level 3 evaluation questions ask whether the learning was applied on the job. They run thirty, sixty, or ninety days after training, never at the end of training itself. Effective Level 3 items are anchored to specific moments: in the past thirty days, how many times did you use the intake protocol from training. Self-report items pair with manager observation when possible. The application moment named in the end-of-training reaction question seeds the behavior question, so the participant remembers what they committed to apply.

Q.10
What are level 4 evaluation questions (results)?

Level 4 results questions ask whether trained behavior moved a downstream outcome. They are tied to an existing operational metric, not a new survey indicator. For a workforce training program: placement rate, retention at six months, employer satisfaction. For a clinical training: case-resolution time, patient outcome scores. For a sales training: conversion rate, deal size, retention. Level 4 questions are usually pulled from operational systems three to twelve months after training, not asked in a survey.

Q.11
What questions should I ask after training to test knowledge?

Knowledge testing belongs at Kirkpatrick Level 2 and works best as paired pre and post. Use scenario items rather than recall items: a one-paragraph case followed by an action question, scored against a rubric. Recall items measure what was memorized; scenario items measure what was understood. Five to eight scenario items, each tied to a learning objective from the training, asked before training and after. The same items, the same scoring rubric, the same participants. The delta per person is the Level 2 score.

Q.12
How do I create a course evaluation survey with Likert items and open-ended questions mapped to Kirkpatrick levels 1 and 2?

Run two instruments, not one. Instrument A at end of session covers Kirkpatrick Level 1 reaction: five Likert items on relevance, clarity, pace, confidence, and application intent, with one paired open-ended prompt asking what one moment was clearest and what one moment was unclear. Instrument B is the post-training knowledge test, paired against an identical pre-test: six to eight scenario items scored against a rubric, plus two open-ended prompts asking the participant to apply what they learned to a case. Same participant identity across both instruments. Levels 1 and 2 connect because the same person fills both forms.

Q.13
What are the best post-training survey questions for pharma sales enablement?

Pharma sales enablement post-training questions span all four Kirkpatrick levels. Level 1 reaction at end of session covers content relevance and pace. Level 2 learning is paired pre and post on six to eight clinical-product scenario items, scored against a compliance rubric. Level 3 behavior at thirty and sixty days asks how often the trained talking points were used in customer conversations and whether compliance review flagged issues. Level 4 results pulls from CRM: conversation count by product, prescription lift, formulary coverage. The question discipline is the same as any other training evaluation; the rubric and the operational metrics are pharma-specific.

Q.14
What is the Orphan Question?

The Orphan Question is a training evaluation question that sits without three connections: the decision it feeds, a paired open-ended counterpart that explains the rating, and a persistent participant identity that links the question to the same person's answers across other waves. Most published question banks are compilations of orphan questions. They can be well-written and still produce data that cannot answer the questions funders and boards now require. The fix is architectural rather than copy-edit: assign a participant identity at enrollment, pair every rating with an open-ended reasoning prompt, and tag every question to a specific decision it feeds.

Q.15
What is the retrospective pre-test?

A retrospective pre-test is a post-survey item asking participants to rate their pre-program state after training, alongside their current state: looking back now, how would you rate your confidence in the skill before the program began. It corrects for response-shift bias, the phenomenon where participants revise their pre-program self-assessment downward once they understand how much there is to know. Use it in addition to, not instead of, the actual pre-test. Pair both readings to measure both actual change and perceived change. The item belongs on the post-survey alongside the matched post-confidence item.

Q.16
What is a good response rate for a post-training survey?

A post-training survey delivered at session end typically reaches seventy to ninety-five percent response. Follow-up rates drop at predictable intervals: thirty to fifty percent at thirty days, twenty to forty percent at sixty days, fifteen to thirty- five percent at ninety days, when the follow-up runs through generic email broadcast. Personalized links tied to the participant record substantially raise the ninety-day response rate because the recipient recognizes the context. Response rate is a function of identity discipline as much as instrument quality.

Q.17
How does Sopact help with training evaluation questions?

Sopact Sense ships with a training evaluation question bank organized by Kirkpatrick level, with paired pre/post structure already built in and decision tags named on every item. Persistent participant identity carries across the pre-training baseline, end-of-program reaction, end-of-program knowledge post, thirty-day behavior follow-up, and ninety-day results indicator. The instrument set acts as one connected record per participant rather than five disconnected forms. Built-in qualitative coding handles open-ended scenario responses without manual recoding.

NEXT STEP

A question bank organized by Kirkpatrick level

The thirty questions on this page are a starting set. The discipline that matters is which level each question maps to, whether the Level 2 items are paired pre and post, whether the Level 3 items are anchored to a named application moment, and whether the Level 4 items are tied to a metric you already track. Sopact Sense ships the question bank with that structure built in and the participant identity that connects the five instruments across the program.