15: Missing Data

Content for Wednesday, May 20, 2026

Before class

📖 Reading:

R4DS Ch 18: Missing values

Assignment 6 is due today

Assignment 6: Data Types & Wrangling — due Sunday, May 17 at 11:59 PM.

Final Project Draft is due today

Submit your working code and preliminary visualizations before class.

During class

We’ll cover:

Why missing data matters in psychology — attrition, non-response, skip patterns
Explicit vs. implicit missing values
Counting and exploring missingness patterns
drop_na() — complete case analysis (listwise deletion)
replace_na() — filling in known values
fill() — carrying values forward/backward
complete() — making implicit missing values explicit
When not to fill — don’t make up data!
Brief mention: multiple imputation exists (beyond this course)

Slides

View slides in new tab Download PDF

Embedded slides

After class

✅ Practice:

Count the number of NAs in each column of a dataset using summarize(across(everything(), ~sum(is.na(.x))))
Use drop_na() on a specific column vs. the whole dataset — what’s the difference in rows lost?
Try replace_na() to fill missing values with a sensible default (e.g., 0 for “no response”)
Use complete() to make implicit missing values explicit in a longitudinal dataset
Think critically: when is it okay to drop missing data? When is it dangerous?

Missing data is information

Missing values aren’t just annoying — they can tell you something. Before dropping or filling NAs, ask:

Why is this value missing? (Didn’t answer? Wasn’t asked? Data entry error?)
Is the missingness random? If participants who dropped out were systematically different, dropping them biases your results
How much is missing? A few values vs. 40% of a column require different strategies

Document your decisions about missing data — your future self (and reviewers) will thank you.

Counting missing values

# Quick summary of missingness across all columns
data |>
  summarize(
    across(everything(), ~sum(is.na(.x)))
  ) |>
  pivot_longer(everything(),
    names_to = "variable",
    values_to = "n_missing"
  ) |>
  arrange(desc(n_missing))