Sampling Procedure

Add Health used three sampling frames, only one of which was available prior to the study.

The initial sampling frame is a stratified list of 26,666 American high schools, defined as schools with an 11th grade and more than 30 students from which high schools were selected with probability proportional to size.

Eighty schools were selected from this list. Subsequently, for those high school without a 7th grade, a second sampling frame was created of high schools' corresponding feeder (middle) schools; a single feeder school for each of the 60 high schools without 7th grade was selected with probability proportional to percentage of high school's entering class that school contributed. Out of the 60 high schools, four had no predominant feeder schools, so 56 feeder schools were selected out of which 52 opted to participate. The total sample was 132 schools, with an overall participation rate of 80%.

A third frame comprising all students at these schools was created by combining a school-provided student roster with a list of students who completed the in-school survey but did not appear on the school roster.

Add Health used stratified, 2-stage element sampling. The primary sampling units were high schools chosen from a stratified list with probability proportional to size; stratification was by region (northeast, midwest, south, west), urban/suburban/rural, school size (30—125, 126—350, 351—775, 776+), school type (public, private, parochial), percent white (0, 1—66, 67—93, 94—100), percent black (0, 1—6, 7—33, 34—100), grade span and school type (K—12, 7—12, 9—12, 10—12, vocational, alternative, special ed). Not all combinations of the above parameters were available.

52 feeder schools (middle schools) for these high schools were selected as detailed above. All students in selected schools (N=132 agreed to participate, of which 129 opted to use class time to give the survey to students) were given the in-school Add Health questionnaire (N=90,118) in Sept 1994 – April 1995.

For the in-home questionnaire, the secondary and ultimate sampling units were students in the 134 selected schools chosen with unequal probability. The in-home subsample was selected from a school-selected roster plus those students who completed the in-school survey but did not appear on the school's roster. The in-home sample of 27,000 included a nationally representative core of 12,105 adolescents in grades 7 to 12 chosen with equal probability; all students from selected ``saturated schools (i.e., chosen with probability related to school size); a genetic oversample comprising twins, full siblings, half siblings, siblings of twins, and non-related adolescents residing together; minority oversamples of disabled, blacks from well-educated families, Chinese, Cubans, and Puerto Ricans. These oversamples were identified by students' answers on the in-school survey.

The saturated schools are intended to demonstrate peer effects on students in keeping with the focus of Add Health on environmental influences on student health decisions. For instance, the use of these saturated schools enabled researchers to discover that ``virginity pledges have a substantial effect on students in high schools where students socialize mainly with each other only when most others in their school do not make such a pledge, so such strategies cannot work universally.

The genetic oversample allowed researchers to attempt to decouple genetic and environmental effects by comparing the behavior of twins with that of full, half, and step siblings; such comparisons are useful because the tendency to engage in certain health risks is thought to be heritable, but there has been little study of the manifestation of these traits in adolescents, and no study that looked at the adolescents in their social contexts. Because there were not enough genetically related students in the original sample of 134 schools, additional schools were selected to augment the size of the genetic sample; these additional subjects lack sample weights, but are intended to be used only in analysis which do not require weights, such as comparisons between siblings and twins.

The minority oversamples allows researchers to study minority groups which would otherwise be too small, and provide information to shape minority adolescents' health programs.

The Wave I In-home interviews were conducted with 20745 adolescents and 17700 parents, April 1995 – Dec 1995.

Wave II occurred a year later, with 14,738 adolescent in-home interviews in April 1996 – August 1996 with those wave I subjects who could be found.

Wave III occurred six years after Wave I in order to examine the transition from adolescence to adulthood (Aug 2001 – Apr 2002, subjects aged 18—26) and correlations between adolescent behavior and health, education attainments, labor market participation, family status, community involvement.

The successive phases of the study allow researchers to attempt to make causal inferences by examining the effects of earlier factors on later health decisions.

Wave I was conducted in several modalities. Students completed an in-school written questionnaire and administrators completed a written questionnaire about the characteristics of their schools. The in-school questionnaire served as a screen and means of supplementing school rosters for sampling for the in-home interview, and it also allowed researchers to obtain information from entire peer groups.

More detailed information was obtained in in-person interviews of both students and parents; for topics such as sex and drug use, students listened to tape-recorded questions through earphones and entered their answers into a laptop (Boonstra 2001). The tape-recorded questions were used in order to avoid interview or parental response bias, and the laptop to assure students of the confidentiality of their answers.

In wave II, school information was updated via telephone interview. This interview could be done by telephone, at lower cost than an in-person visit, because school administrators are only used to provide contextual information.

In-home interviews were conducted as above for both waves II and III. Despite the massive size of the survey and the high cost of in-person interviews, the quality of the data was seen as worth the high cost.

