Analysis of Complex Survey Data (from NCES)

The Center for Integrated Latent Variable Research (CILVR)



December 10-11, 2020 (Thursday-Friday)

taught by

Laura Stapleton, University of Maryland



A wealth of publicly available national and international data exists for use by researchers in education and other social science disciplines. Of particular interest for this workshop are the data supplied by the National Center for Education Statistics (NCES) although the topics presented in this workshop generalize to other large-scale data collection. The goal of this workshop is to allow those researchers who are new to using national and international data to become more comfortable with accessing and appropriately analyzing the data. Of particular concern in analyzing public-release data from the NCES is that a complex probability sampling technique was used to obtain responses form participants. Such a sampling design requires the use of specialized statistics to obtain unbiased point estimates (e.g., means, regression coefficients, structural path estimates) and sampling variance estimates (e.g., standard errors). The aim of this workshop is to instruct researchers on best practices on working with these data.


This short course is meant to introduce participants to the issues in working with national and international complex probability sample data sets, including both conceptual issues in measurement and setting up models as well as in the specialized statistical procedures required to conduct appropriate analyses. Participants will be presented with structured examples of downloading data and addressing the analytic challenges, as well as be given an opportunity to explore their own analyses with feedback from the instructor.  At the end of the short course, participants should be able to:

  • Learn about and download public-release data from the National Center for Education Statistics
  • Acknowledge the limitations in using these data for analyses, given constraints in measurement and the observational nature of survey data
  • Describe the differences between the many types of weights available on the data set (e.g., sampling weights, panel weights, replicate weights)
  • Undertake basic statistical analysis (descriptive analysis, t-tests, multiple regression), obtaining appropriate unbiased estimates and standard errors by using sampling weights and specialized variance estimation techniques (supported software for basic analysis will include SPSS, SAS, R, and Mplus)
  • Recognize advanced issues that may need to be addressed, such as imputation methods for missing data and domain analysis for studies of subpopulations
  • Identify advantages and disadvantages in utilizing multilevel models with these types of data

The target audience for this course is any individual with an intermediate knowledge of statistical analyses who seeks to conduct or understand analysis of complex survey data. This population includes all levels of graduate students, assuming a basic knowledge of research design and statistical analysis. Researchers working within academic institutions or research agencies are the ideal audience. Other individuals may benefit from the course as well, especially if their work focuses on using extant data collected by the U.S. Education Department.



  • Intermediate proficiency in statistical programming language (e.g., SPSS, SAS, R)
  • Intermediate proficiency in inferential statistics

Not required but advantageous:

  • Experience working with large datasets
  • Knowledge of advanced statistical modeling (e.g., HLM, SEM)

No level of proficiency beyond basic awareness is assumed for skills related to:

  • NCES
  • Survey design and measurement

Examples and support will be provided for SPSS, R, SAS, and Mplus software packages.


December 10 - 11, 2020 (Thursday – Friday)

10am-5pm Eastern Daylight Time (UTC-4)

Instructor will determine timing of lunch break, as well as morning and afternoon breaks.


Professional: $345

Full-time student*: $195

*Full-time students must submit student status proof at for prompt processing of the registration.

Free for registered HDQM Department faculty and degree-seeking students, although you must register through the internal link. 

REFUND POLICY: Full refund if cancellation occurs at least 10 business days prior to the workshop date; 50% refund if within 10 days of the first day of the course.


One-time Registration:

- For professional and full-time students participants, please register using this link:

- Full-time students must also submit the student status proof at for prompt processing of the registration. Note that it may take 2-3 business days for your registration to be processed.


Bundle Registration: 

- Participants who wish to register for multiple CILVR short courses in 2020-2021 as a bundle and obtain ONE receipt for the bundle registrations can submit the request at


HDQM Registration:

- HDQM department registrants can register using the following registration form:


This workshop will be delivered entirely online via the video conferencing software Zoom ( 

Within a limited time, the video recordings of the short course will be available for both synchronous and asynchronous participants on Vimeo (


The two-day workshop will entail a mix of synchronous and asynchronous work. Each day the schedule will be (all times EST): 

10am-12pm synchronous lecture

12-1:00pm    asynchronous problem sets

1-1:30pm      synchronous work through problem sets

1:30-3:30      synchronous lecture with short break

3:30-4:30      asynchronous problem sets

4:30-5pm      synchronous work through problem sets.


Support for students from Underrepresented Groups to attend methodological workshops (from the Society of Multivariate Experimental Psychology):


Format: Participants will receive a personalized login code to use on their own computer to access a reliable live-stream of the short course over Zoom, showing the instructor as well as the handouts.

Materials: Participants will receive electronic copies of the short course materials, as well as any other relevant materials or information.

Timing/access: Participants may choose to watch the stream synchronously, or may elect to watch a recording of the short course asynchronously, or both. Recordings will be available to participants for two weeks following the end of the short course. This is especially useful for on-line participants in different time zones who may choose to watch at some later time than (but within two weeks of) the actual short course time. (Asynchronous participation does not include real-time chat with other on-line participants, although a visual record of prior chats will be viewable).

Technical support: Participants are responsible for installing the conferencing software Zoom on their own electronic devices and for obtaining a Zoom account that allows the participant to join Zoom meetings and webinars hosted by external organizations. Participants are assumed to be able to secure a reliable computer, internet browser, and Wi-Fi connection. Challenges at the user end must be resolved by the user. Fortunately, because the short course is recorded, users experiencing technical challenges can still “catch up” by watching the recordings to which they have access.

Content support: During the lecture, real-time content support for on-line participants is mostly limited to real-time chat with the on-line (Zoom) participant community and any quantitative methodology doctoral students who might also be participating. Participants may have direct interactions with the instructor in some format during the practice sessions. On-line participants may e-mail the instructor for further content support that cannot be addressed in real-time.


For any questions, please contact Ms. Chen Tian at


Dr. Laura M. Stapleton is a Professor in the Department of Human Development and Quantitative Methodology at the University of Maryland, College Park and Associate Dean for Research in the College of Education. She has taught courses on multilevel modeling and causal inference at the University of Maryland and formerly taught structural equation modeling and survey research methods at the University of Texas at Austin.  Dr. Stapleton has published several book chapters on analysis of data from complex sampling designs and has published methodological work in this area in professional journals, including Structural Equation Modeling and Multivariate Behavioral Research.  She has been funded by an Institute of Education Sciences methodology grant to evaluate strategies for analyzing NCES dataShe holds a B.A in Economics from the University of Michigan, an M.Ed. in Curriculum and Instruction from George Mason University and a Ph.D. in Measurement, Statistics and Evaluation from the University of Maryland. She may be reached at