Last updated: April 2026 · Part of the Data Collection & Reporting series
What primary data is, and why the definition matters
Primary data is information a researcher collects directly from its original source for a specific research purpose. A survey you design and send, an interview you conduct, an observation you record, an experiment you run — the data that comes out of any of these is primary. It has not been previously published, interpreted, or processed by someone else. That first-person collection is the defining feature.
This page is written for students, researchers, program evaluators, and analysts who need a clear reference on primary data — what it is, where it comes from, how it is collected, where it helps, and where it runs into trouble. The sections below cover the formal definition, characteristics, types, sources, collection methods, real examples across fields, advantages, disadvantages, the comparison with secondary data, and a practical set of steps for collecting primary data that holds up under analysis.
Primary data — at a glance
Information you collect directly, for a question you defined
Every piece of primary data comes from one of four sources and is produced through a handful of methods. The researcher designs the instrument, makes contact with the source, and owns the resulting dataset.
Working definition
Primary data is information a researcher collects firsthand — from people, settings, documents, or samples — for a specific research purpose.
The four sources
People
- Survey respondents
- Interview participants
- Focus group members
- Experiment subjects
Settings
- Classrooms
- Workplaces
- Retail spaces
- Digital platforms
Documents
- Field notes
- Audio recordings
- Video sessions
- Researcher journals
Samples
- Blood, tissue
- Soil, water
- Air quality
- Product samples
The seven most common methods
Surveys
Interviews
Focus groups
Observations
Experiments
Case studies
Self-assessments
What is primary data?
Primary data is information gathered firsthand by the researcher for the research question at hand. The researcher decides what to measure, who to measure it from, which instruments to use, and when to collect. Because of that direct control, primary data reflects the current state of whatever is being studied and answers the specific question that motivated the study.
The term traces back to the idea of a primary source in research methodology. A primary source is original material — a survey response, an interview recording, a set of field notes, an experimental measurement — produced by someone with direct contact with the subject. A secondary source is material that interprets, analyzes, or summarizes primary material. Data inherits the same distinction: data you collect directly is primary, and data someone else collected and published is secondary.
Different fields use slightly different labels for the same idea. In marketing, primary data is often called first-party data, because it is collected through your own channels and owned by your organization. In statistics, primary data refers to observations gathered for the specific statistical investigation at hand. In journalism and historical research, the term primary source is more common. In general business research, raw data and original data are sometimes used as informal synonyms. Across all these uses, the defining feature stays the same: the data was collected by or for the person now using it, for a purpose they defined.
One quick clarification that trips up students. A dataset is primary relative to the person who collected it, not relative to the subject. If a nonprofit collects survey responses from program participants, that data is primary for the nonprofit. If the nonprofit later publishes the anonymized data and another researcher downloads it to test a different hypothesis, the same dataset becomes secondary for the second researcher. Primary and secondary are roles the data plays, not permanent labels.
Characteristics of primary data
Five characteristics distinguish primary data from secondary data in practice.
Purpose-specific. Every instrument — the survey questions, the interview guide, the observation protocol — is designed to answer a particular research question. Unlike secondary data, which you adapt to your question after the fact, primary data is shaped around the question from the start.
Current. Primary data reflects the conditions at the moment of collection. For questions about present-day behavior, attitudes, skills, or outcomes, this matters. A nonprofit measuring participant confidence at program exit is capturing confidence today, not confidence from a three-year-old published study.
Controlled. The researcher owns the methodology — the sampling frame, the question wording, the validation rules, the timing. That control is what makes primary data defensible when reviewers ask how a particular finding was reached. It is also what makes primary data expensive, because that level of control requires design work, piloting, and monitoring.
Proprietary. Primary data belongs to the organization that collected it, subject to research ethics and participant consent. A company's customer interviews, a hospital's patient-reported outcomes, a foundation's grantee surveys — these datasets are not available to competitors, peer organizations, or the general public unless the collector chooses to share them.
Contextually deep. Primary data carries the context of its collection. A survey response exists alongside the date, the touchpoint, the survey version, and often the respondent's own explanation of their answer. This lets the researcher interpret numbers in light of the circumstances that produced them — which is harder to do with a secondary dataset stripped of its original context.
Types of primary data
Primary data takes three broad forms based on the type of information it represents.
Quantitative primary data consists of numerical values. Survey ratings on a one-to-five scale, test scores, counts of observed behaviors, physiological measurements, A/B test conversion rates — each of these is a number that can be summed, averaged, compared across groups, and modeled statistically. Quantitative data is useful when the research question asks how many, how much, how often, or how does X compare to Y.
Qualitative primary data consists of non-numerical information — usually text, audio, video, or images. Interview transcripts, open-ended survey responses, field notes from observations, participant journals, recordings of focus groups, photographs of classroom interactions. Qualitative data is useful when the research question asks why, how, or what does this mean to the people involved. It captures reasoning, emotion, and context that numbers miss.
Mixed-method primary data combines both. A workforce training evaluation might collect pre-program confidence scores on a scale (quantitative) alongside open-ended responses about what participants hope to learn (qualitative). A customer satisfaction survey might include both a numerical rating and a follow-up text box asking why. Mixed-method designs are common in program evaluation, social research, and user experience research, because most substantive questions have both a measurable and an explanatory dimension.
The form of the data shapes the analysis approach. Quantitative data is typically analyzed with descriptive and inferential statistics. Qualitative data is analyzed through coding — the process of reading responses, tagging themes, and organizing them into patterns. Mixed-method data is analyzed through some combination of both, often in sequence: quantitative analysis first to establish patterns, qualitative analysis next to explain what drives them.
Sources of primary data
A source is the origin of the data — the person, setting, artifact, or material from which information flows to the researcher. Four source categories cover most primary data collection.
People. The most common source. Survey respondents, interview participants, focus group members, experiment subjects, and those who complete self-assessments all provide primary data through their responses. Collecting data from people requires informed consent, clear communication about how the data will be used, and attention to participant wellbeing. Ethical review boards govern this work in most academic and clinical contexts.
Environments and settings. Physical or digital spaces where behavior unfolds. A classroom during a lesson, a retail store during a peak shopping hour, a workplace during a team meeting, a website during a user session. The researcher systematically records what happens in the setting. The data is primary because the researcher collected it directly through structured observation rather than reading about it elsewhere.
Documents and records created during the research. Field notebooks, researcher journals, audio recordings of interviews, video of participant sessions, photographs of sites, measurement logs from instruments. These artifacts are themselves primary data when they are generated as part of the research process. They differ from pre-existing documents — an old annual report, a published journal article, a historical archive — which would be secondary sources.
Physical and biological samples. In scientific research, the source can be a material sample. Blood, urine, or tissue samples in clinical research. Water, soil, or air samples in environmental research. Product samples in quality testing. The researcher collects the sample and analyzes it directly.
Selecting the right source is a prerequisite to selecting the right method. A question about lived experience points to people as the source and interviews or open-ended surveys as the method. A question about actual behavior points to a setting as the source and observation as the method. A question about biological effect points to samples as the source and laboratory measurement as the method. Matching source to method is one of the first design decisions in any primary data study.
Methods of primary data collection
Primary data collection methods are the techniques used to gather the data. The choice depends on the research question, the type of data needed, the population being studied, and the available budget and time.
Surveys and questionnaires
A survey is a structured instrument that asks the same set of questions to each respondent. The questions may be closed-ended — scales, multiple choice, yes/no — or open-ended, asking for a written response. Surveys reach large numbers of people efficiently and produce data that can be compared across respondents.
Surveys are well-suited to research questions about attitudes, satisfaction, frequency of behaviors, knowledge, and self-reported outcomes. They work less well for questions that require deep narrative explanation or real-time behavioral detail. The most common failure modes are poor question wording, low response rates, and respondent fatigue on long instruments.
Modes include online, paper, telephone, in-person, and mobile. Online surveys are the least expensive to run at scale. Paper, phone, and in-person surveys remain important for populations with limited internet access or where response quality depends on direct contact.
Interviews
An interview is a one-on-one conversation between the researcher and the participant. Interviews come in three forms.
Structured interviews follow a fixed script. Every participant answers the same questions in the same order. This makes responses comparable across participants but gives up some depth.
Semi-structured interviews use a guide of key questions but allow the interviewer to follow up, probe, and adjust based on what the participant says. This is the most common format in qualitative research. It balances comparability with depth.
Unstructured interviews start with a topic and let the conversation unfold organically. They produce the richest data but are the hardest to compare across participants. They are typical in exploratory research, ethnography, and narrative inquiry.
Interviews are the right method when the question is about reasoning, meaning, experience, or context that a survey cannot capture. They are resource-intensive — each interview typically runs thirty to ninety minutes, plus transcription and coding time afterward.
Focus groups
A focus group is a moderated discussion with six to twelve participants on a specific topic. The moderator leads the group through a set of prompts and observes how participants interact. Focus groups are common in market research, program design, and policy consultation.
Focus groups are useful when the research question is about collective attitudes, group dynamics, or how people influence each other's thinking on a topic. They are efficient — one session produces data from multiple participants — but they come with risks. Dominant participants can crowd out quieter ones. Social desirability can push responses toward the perceived group norm. Moderator skill materially affects data quality.
Observations
Observation is the systematic recording of behavior, interactions, or events in a natural or controlled setting. In participant observation, the researcher is embedded in the environment being studied and takes part in what is happening. In non-participant observation, the researcher watches from outside without influencing events. A third variant, structured observation, uses a predefined coding protocol so multiple observers record events consistently.
Observation captures what people actually do, which often differs from what they report doing on a survey. It is well-suited to studies of classroom dynamics, workplace behavior, customer interactions, clinical encounters, and community events. The main costs are time and observer training. Without a shared protocol, observations drift into subjectivity.
Experiments
An experiment manipulates one or more variables under controlled conditions to test cause-and-effect relationships. In a randomized controlled trial, participants are randomly assigned to either a treatment condition or a control condition, and differences in outcomes are attributed to the treatment. Randomization is what distinguishes an experiment from a quasi-experiment or an observational study.
Experiments produce the strongest evidence for causal claims. They are the standard in clinical drug trials, behavioral economics, and some educational research. A/B tests in product and marketing research are a lightweight form of experiment. Experiments require significant design effort, ethical review, and enough participants to detect the effect being tested.
Case studies
A case study is an in-depth investigation of a single unit — a person, an organization, a program, a community, or an event. Case studies draw on multiple methods at once: interviews, document review, observation, sometimes survey data. The goal is a thorough understanding of the case in its context.
Case studies are appropriate when the research question is about complexity, process, or a phenomenon that cannot be isolated from its setting. They produce rich, nuanced evidence but do not support broad generalizations from a single case.
Self-assessments and diaries
Participants record their own experiences, behaviors, or progress over time. Pre-post self-assessments measure change on a consistent scale before and after an intervention — common in training programs, therapy, and clinical contexts. Diaries capture daily experience in ways a one-shot survey cannot.
Self-reported data carries the limitation that it reflects what the participant perceives or is willing to share, not necessarily objective reality. Paired with behavioral or observational data, it provides important context that other methods miss.
Practical principles
Six principles for primary data collection that holds up under analysis
The problems most researchers blame on the analysis phase were actually introduced at the collection phase. These six principles prevent that.
Principle 01
Write the research questions first
Draft three to five specific questions the data must answer before writing a single survey item. Every instrument question should trace back to one of them. Questions that do not, do not belong on the instrument.
Principle 02
Assign one identifier per participant
Give every participant a unique identifier that stays the same across intake, mid-program, exit, and follow-up. Without a shared identifier, analyzing change over time requires manual record-matching that loses participants.
Principle 03
Validate at the point of entry
Required fields, format checks, range limits, and conditional logic catch bad data when it is still easy to fix. Cleaning up after the fact is slower and loses records that should have been flagged during collection.
Principle 04
Pair numbers with reasoning
When a rating scale appears on an instrument, include a follow-up prompt asking why the respondent chose that rating. Scores without context are hard to interpret; scores with short reasoning are not.
Principle 05
Keep methods connected through the identifier
A single study often uses surveys, interviews, and observations. When all of them share the participant identifier and timestamp, a cross-method question can be answered in one query. When they live in separate tools, it cannot.
Principle 06
Analyze continuously, not at the end
Data-quality problems caught on day ten can be fixed. Problems found on day one hundred cannot. Continuous analysis also surfaces patterns while there is still time to adjust the collection.
Examples of primary data across fields
The same research methods produce primary data that looks very different across fields. A few concrete examples.
Nonprofit and social impact. A workforce training program collects intake surveys measuring participant confidence and demographics, attendance records tracking session participation, mid-program check-ins assessing progress, exit surveys measuring skill gain, and six-month follow-up calls tracking employment outcomes. Every piece is primary data, collected directly from participants at known touchpoints, designed around the program's evaluation questions.
Customer and market research. A consumer brand runs post-purchase satisfaction surveys, in-store observations of shopping behavior, usability tests of a new product with recorded screens, and focus groups to test messaging concepts. Each activity produces primary data the brand owns and controls.
Education. A university runs end-of-semester course evaluations, classroom observations conducted by peer reviewers using a structured protocol, pre-post knowledge assessments, and alumni career tracking surveys sent five years after graduation. The data supports decisions about curriculum, teaching quality, and long-term outcomes.
Clinical and health research. A hospital runs patient symptom diaries, structured clinical assessments at fixed intervals, laboratory measurements of biological markers, and post-discharge interviews. Each instrument is a primary data collection activity governed by the study protocol and institutional review.
Social science and academic research. A sociologist studying workplace culture conducts semi-structured interviews with employees, observes team meetings, and administers a survey on organizational trust. An anthropologist living in a community records field notes on daily life. A political scientist conducts exit polls on election day. All of this is primary data — collected by the researcher, for the research question, at the source.
What these examples share is not the method or the field but the collection relationship: the researcher designed the instrument, engaged directly with the source, and owns the resulting dataset. What differs across the examples is how the data is used and how well the collection infrastructure holds up over time. That second point is where most primary data projects run into trouble, and it is where the practitioner angle later in this page becomes useful.
Advantages of primary data
Six advantages explain why primary data is the default choice for research questions that require specificity, currency, or depth.
Relevance to the research question. Every instrument is designed around the question the study needs to answer. There is no gap between what the researcher wants to know and what the data measures.
Current information. Primary data reflects the present moment. For questions about current conditions — present-day attitudes, recent behaviors, ongoing program outcomes — this is decisive. Secondary data sources are typically months to years old.
Full control over methodology. The researcher designs the sampling, the instruments, the validation rules, the data-collection process, and the timing. That control supports both data quality and defensibility. When a reviewer asks how a particular finding was reached, the researcher has a full answer.
Proprietary value. Primary data is owned by the collecting organization. For commercial research, this means competitors do not have access. For evaluation research, it means findings can support internal learning without exposing sensitive information.
Contextual depth. Paired open-ended responses, field notes, interview recordings, and observation data supply the context behind the numbers. Primary data lets the researcher understand why the numbers look the way they do, not just what the numbers are.
Audit trail. A well-designed primary data collection process produces a documented record of who was asked what, when, and how. That record supports external review, publication, funder reporting, and future replication.
Disadvantages of primary data
Primary data also carries real costs. Five disadvantages show up in most projects.
Cost. Designing instruments, recruiting participants, compensating respondents, training data collectors, running analysis — all of this takes staff time and budget. Secondary data is typically free or low-cost in comparison.
Time. Primary data collection takes months from design through analysis in most fields. A well-designed program evaluation study often runs six months to a year before findings are available. Secondary datasets are immediately accessible.
Required expertise. Writing a valid survey is harder than it looks. Conducting a usable interview, running an RCT, coding qualitative data — each requires trained judgment. Projects that skip the design expertise produce data that answers the wrong question or answers it unreliably.
Sample size limits. Cost and logistics constrain how many participants a study can include. Small samples produce findings that may not generalize beyond the study population, and they lack the statistical power to detect small effects.
Integration across methods is fragile. This is where primary data projects most often go wrong in practice. When surveys, interviews, observations, and records live in separate tools — SurveyMonkey for intake, Zoom for interviews, spreadsheets for field notes, email threads for follow-ups — there is typically no shared identifier linking a participant's data across methods. When the evaluation team tries to answer a cross-method question like did participants who scored low on confidence at intake describe the same barriers in exit interviews, the linkage has to be reconstructed by hand, and a meaningful share of records is lost in the process.
The first four disadvantages are inherent to primary data collection. The fifth is avoidable with the right collection setup — which is the practical topic covered later on this page.
Primary data vs secondary data
The difference between primary and secondary data comes down to who collected the data and why. Primary data is gathered firsthand by you for your specific research question. Secondary data already exists, having been collected by someone else for a different purpose.
Side by side
Primary data vs secondary data
The difference comes down to who collected the data and why. Eight dimensions that shape the choice between the two.
Who collected it
You, for your research question
Someone else, for a different purpose
Currency
Reflects the moment of collection
Often months to years old
Control
Methodology, sampling, timing — all yours
No control; the design was someone else's choice
Cost
High — design, collection, and analysis all take investment
Low — often free or available through databases
Time
Months from design to findings
Immediate access
Fit to question
Designed around the question exactly
May require adaptation; original questions weren't yours
Reliability
Depends on your design; verifiable at the source
Depends on the original collector; not always fully documented
Typical examples
Surveys, interviews, observations, experiments, self-assessments
Census data, published studies, government reports, industry databases
Use primary when
The question needs current, specific information about a particular population, program, or situation — and no existing source answers it well enough.
Use secondary when
You need context, benchmarks, or background; published research already answers the question; or the research budget and timeline do not support primary collection.
A shorter version. Primary data tells you what your specific participants are doing and saying, now. Secondary data tells you what some general population did, some time ago, in some other study. The two answer different kinds of questions and, in strong research designs, are used together. A labor-market program that combines its own participant survey data (primary) with Bureau of Labor Statistics employment figures (secondary) can say both how our participants are doing and how that compares to the broader market. Neither source alone answers both questions.
A more detailed comparison of these trade-offs, with examples of how organizations combine the two in practice, is covered in Primary vs Secondary Data.
How to collect primary data in practice
The seven steps below describe a collection process that produces data usable for analysis without a long reconciliation phase at the end.
1. Define the research questions
Write three to five specific research questions the data must answer. Each question should be answerable with the methods and budget available. Vague questions — is the program working — produce data that cannot be cleanly analyzed. Specific questions — did participants report higher confidence at exit than at intake, and did that pattern differ by cohort — point directly at the measurements the instruments need to include.
2. Choose the methods
Match each research question to a method. Use surveys for questions about breadth and frequency, interviews for questions about reasoning and experience, observations for questions about actual behavior, experiments for questions about causal effect. Most substantive research questions require mixed methods — typically a survey plus either interviews or observations.
3. Design the instruments
Pilot every instrument before full deployment. Write the questions, test them with a small group that looks like the study population, and revise based on what went wrong. Ambiguous wording, loaded questions, confusing scales, and excessive length are the most common instrument-design failures. Tie every question back to a specific research question — if a question does not map to a research question, drop it.
At this stage, the practical detail that matters most for later analysis is participant identity. Assign every participant a unique identifier that will stay the same across every touchpoint of the study. Build that identifier into every instrument from the start — intake surveys, interview guides, observation forms, follow-up surveys. This sounds trivial. It is the single most common place where primary data projects lose the ability to analyze change over time.
4. Pilot test
Run the full instrument with a small group before launching. Check for technical issues, respondent confusion, unrealistic time burden, and data quality problems. Revise and retest until the instrument produces the kind of data the analysis plan needs.
5. Collect with identity integrity
Launch collection with every participant tied to their unique identifier. Use validation rules at the point of entry — required fields, format checks, range limits, conditional logic — so that bad data cannot enter the system. When the same participant completes a second or third touchpoint (mid-program, exit, six-month follow-up), the identifier connects their responses automatically, without the manual matching step that loses records.
6. Analyze continuously, not at the end
Modern collection tools let data be analyzed as it arrives rather than at the end of the collection window. Continuous analysis surfaces emerging patterns, highlights data-quality issues early, and supports mid-course adjustments while there is still time to make them. The traditional model — collect for six months, then spend three months cleaning, then analyze — delays insight until it is too late to act on.
7. Export cleanly to reporting
Export clean, structured data to whatever reporting environment the stakeholders use — a BI tool, a funder-facing dashboard, a research paper. Include a data dictionary that explains every field, the scale it uses, and how it was collected. Preserve the link between quantitative scores and the qualitative responses that explain them, so reports can show both the number and the context behind it.
Where primary data collection breaks down
Most of the problems that show up in the analysis phase of a primary data study were introduced — or made unavoidable — at the collection phase. Three problems are the most common.
Fragmented identity. When intake responses live in one tool, exit responses in another, and interview notes in a third, there is usually no shared identifier linking the same person's data across them. Manually matching records later loses a meaningful share of participants, and it produces a dataset where apparent patterns are actually artifacts of which records happened to match. Designing a single participant identifier into every instrument from the start prevents this.
Unvalidated entry. Data that enters the system with formatting errors, impossible values, or blank required fields produces a cleanup burden later — and some of those errors are unrecoverable after the fact. Validation at the point of entry, including required fields, format checks, range limits, and conditional logic, catches these issues at the moment they are easiest to fix.
Disconnected numbers and narratives. Most survey platforms store a rating on one table and the open-ended why response on another. When a satisfaction score drops, the team has to find, download, and manually read the free-text responses to understand what is driving the change — often by a separate person, often weeks later. Collecting rated items and open-ended explanations into the same participant record, with shared identity and timestamps, lets analysts interpret a number in the context of its explanation in the same step.
These are not theoretical. They are the three most common reasons primary data projects miss their deadlines, cost more than budgeted, or produce reports that cannot fully answer the question that started the study.
Frequently asked questions
What is primary data in simple terms?
Primary data is information a researcher collects firsthand — through surveys, interviews, observations, experiments, or direct measurement — for a specific research purpose. It has not been previously published or interpreted by someone else.
What are the four main sources of primary data?
The four main sources are people (survey respondents, interview participants, focus group members, experiment subjects), environments and settings (classrooms, workplaces, retail spaces, online platforms under observation), documents created during the research (field notes, recordings, photographs), and physical or biological samples (blood, soil, water, product samples).
What are examples of primary data?
Examples include participant surveys at program intake and exit, structured interviews with open-ended narrative responses, pre- and post-skills assessments, focus group transcripts, patient symptom diaries, classroom observations, usability test recordings, and direct field measurements. In each case the data was collected directly for the study.
What is the difference between primary and secondary data?
Primary data is collected firsthand by the researcher for a specific purpose. Secondary data already exists, having been collected by someone else for a different purpose. Primary data gives specificity and currency but takes time and budget to collect. Secondary data is fast and cheap but may not match the research question exactly.
What are the five characteristics of primary data?
Primary data is purpose-specific (designed for the research question), current (reflects present-day conditions), controlled (the researcher owns the methodology), proprietary (exclusively owned by the collector), and contextually deep (carries the circumstances of collection with it).
What are the main methods of primary data collection?
The core methods are surveys and questionnaires, interviews (structured, semi-structured, or unstructured), focus groups, observations (participant or non-participant), experiments including randomized controlled trials and A/B tests, case studies, and self-assessments or diaries. Most studies use two or more methods together.
What are the main advantages of primary data?
Primary data is designed around the research question, reflects current conditions, gives full control over methodology, is owned by the collector, carries context that supports interpretation, and produces an audit trail that supports external review.
What are the main disadvantages of primary data?
Primary data collection takes time and money, requires trained researchers to design and run, is constrained in sample size by budget, and — in practice — often runs into linkage and integration problems when different methods are captured in disconnected tools.
Is an interview primary or secondary data?
An interview you conduct yourself produces primary data. An interview conducted by someone else and later read or cited by you is a secondary source.
Is survey data primary or secondary?
A survey you design and administer is primary data. A published survey dataset that someone else collected, which you download and reanalyze, is secondary data.
When should you use primary data?
Use primary data when the question requires current, specific information about a particular population, program, or situation — and when no existing source answers the question well enough. Program evaluation, customer research, clinical trials, and most applied research require primary data.
When should you use secondary data?
Use secondary data when you need context, benchmarks, or background before investing in primary collection, or when published research already answers the question. Secondary data is often a useful starting point even in studies that will eventually collect primary data.
References
- Driscoll, D. L. (2011). Introduction to Primary Research: Observations, Surveys, and Interviews. Writing Spaces: Readings on Writing, Vol. 2.
- Saunders, M., Lewis, P., & Thornhill, A. (2019). Research Methods for Business Students (8th ed.). Pearson.
- Yin, R. K. (2017). Case Study Research and Applications: Design and Methods (6th ed.). SAGE Publications.
- Creswell, J. W., & Creswell, J. D. (2022). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches (6th ed.). SAGE Publications.
- U.S. National Institutes of Health, Office of Extramural Research — Human subjects research guidance and primary data collection standards.
- OECD (2023). Good Practices for Data Ethics in the Public Sector.
For practitioners
Set up primary data collection that holds up under analysis
Sopact Sense applies the six principles on this page by default — unique participant IDs across every touchpoint, validation at entry, numbers paired with reasoning, continuous analysis instead of a separate cleanup phase at the end. Useful for program evaluations, longitudinal studies, and mixed-method research that needs to stay analysis-ready.