What separates strong primary data from a stack of forms
Five properties carry a primary dataset from operational artifact to evidence.
Persistent identity: the same participant is recognizable across intake,
mid-program, and follow-up, even if their name, email, or language preferences
change. Without it, longitudinal analysis becomes approximation.
Aligned definitions: every form, cohort, and fund uses the same dictionary.
Skills training, capacity building, and professional development rolling up to one
outcome category requires the dictionary to say they do. Without alignment,
cross-cohort comparison breaks at the merge step.
Paired quant + qual: every closed-ended item has an open-ended probe on the
same record. Pairing happens at the source, not at the end of analysis. Correlation
then becomes a query against one table.
Documented sampling: who was eligible, who was reached, who responded, who
dropped out, and what differs between them. A "convenience sample of trainees" is
not a sampling frame. Documentation is what lets a reader judge generalizability.
Audit trail: every field traces back to the instrument, the participant
consent terms, and the collection wave. Audit-ready primary data is the precondition
for the AI workflow that follows.