Definitions
Five questions readers ask first
The terms longitudinal data, longitudinal dataset, longitudinal tracking, and panel data are often used as if they meant the same thing. They mostly do, with small shades of difference. The five answers below cover the five question forms that send readers to this page.
What is longitudinal data?
Longitudinal data is data collected from the same units at multiple points in time, with each unit's measurements connected across waves. The unit can be a person, an organization, a location, a piece of legislation, or any entity that persists across the time window. The defining feature is the connection across waves: every row in the dataset can be traced back to the same unit at every time point.
Without that connection, the data is a sequence of cross-sectional snapshots, not longitudinal data. Length without a stable identifier is not longitudinal data. The same survey given to the same group at three time points produces longitudinal data only if the same person's three responses can be retrieved together.
Longitudinal data definition
The standard textbook definition: data containing repeated measurements on the same units over time, where each unit's measurements are linkable across time periods. Some definitions add the requirement of a balanced design (every unit measured at every wave) but most applied work treats unbalanced data as longitudinal as long as the linking is preserved.
In econometrics, the term panel data is more common and is mostly synonymous. In epidemiology, cohort data serves the same purpose. In social science and program evaluation, longitudinal data is the dominant term. The thing they all describe is the same: same units, multiple times, connected.
Longitudinal data meaning
The word longitudinal comes from "longitude," meaning length. Longitudinal data is data with length in time: stretched across multiple waves rather than compressed into one. Saying data is longitudinal does not commit the dataset to a length, only to the structure. Two waves six weeks apart produce longitudinal data; multi-decade studies produce longitudinal data. The structural requirement is the same in both cases.
What makes the structure work is the link between waves. Two surveys six weeks apart with no way to match the same person's answers between them is not longitudinal data; it is two cross-sectional samples. The link is the longitudinal part.
What is longitudinal tracking?
Longitudinal tracking is the operational practice of keeping each unit identifiable across waves so that wave-by-wave measurements can be connected. A tracking ID is set when the unit first enters the dataset. Every later measurement of the same unit attaches to that ID. The tracking work is what turns a sequence of separate collections into longitudinal data.
Tracking is the most common reason longitudinal studies fail to produce clean data. Email addresses change. People get married and change last names. Organizations rebrand. Without a stable ID set at first contact, the matching has to happen at analysis time through fuzzy joins on names and contact details, and twenty to forty percent of records typically fail to match. The tracking happens at collection or it does not happen at all.
Longitudinal data example
The cleanest example to picture: imagine a workforce-training cohort of 320 participants surveyed at intake, end-of-program (six months in), twelve months after exit, and twenty-four months after exit. Each participant has a tracking ID set at intake. The dataset has 320 rows in wide format, with the same fields appearing four times each (skill_w1, skill_w2, skill_w3, skill_w4) plus identifier columns. By the end, the dataset can answer "did Maria's wage rise" rather than only "did the group's average wage rise."
The same structural pattern appears in healthcare longitudinal records (one patient, multiple visits across years), in policy tracking databases (one regulation, multiple amendment cycles), and in education state systems (one student, kindergarten through workforce). For the deeper walkthrough, see section five below.
What it is not
Four data structures that get confused with longitudinal data
These four structures share features with longitudinal data and are often used in the same conversations. Each one differs from longitudinal data in a specific way. Knowing the difference is what tells you whether the dataset you are reading or building is what you think it is.
Cross-sectional data
Different units, one time
Cross-sectional data is collected from many different units at one moment. A national household survey run once is cross-sectional. The same survey run again next year, with new respondents, is two cross-sections, not longitudinal data. The unit must repeat across waves to make it longitudinal.
Time-series data
One unit, many times
Time-series data is one unit (or aggregate) measured at many time points. Daily stock price across one company is time-series. National GDP across decades is time-series of an aggregate. Longitudinal data is many units across many times; time-series is one across many. The analysis methods differ accordingly.
Panel data
Mostly a synonym
Panel data is the econometrics term for longitudinal data. Strictly, panel implies a balanced design (every unit measured at every wave at the same intervals); longitudinal includes unbalanced cases. In applied work the two are interchangeable; in formal econometric writing, panel is more specific.
Repeated cross-sections
Same survey, different people
Repeated cross-sections is the same questionnaire given to fresh samples at multiple time points. National opinion polls run quarterly are repeated cross-sections. The unit changes between waves; only the population stays roughly stable. Repeated cross-sections cannot answer within-person change questions, only how the population's averages move.