15: Missing Data
Content for Wednesday, May 20, 2026
Before class
📖 Reading:
ImportantAssignment 6 is due today
Assignment 6: Data Types & Wrangling — due Sunday, May 17 at 11:59 PM.
ImportantFinal Project Draft is due today
Submit your working code and preliminary visualizations before class.
During class
We’ll cover:
- Why missing data matters in psychology — attrition, non-response, skip patterns
- Explicit vs. implicit missing values
- Counting and exploring missingness patterns
drop_na()— complete case analysis (listwise deletion)replace_na()— filling in known valuesfill()— carrying values forward/backwardcomplete()— making implicit missing values explicit- When not to fill — don’t make up data!
- Brief mention: multiple imputation exists (beyond this course)
Slides
View slides in new tab Download PDFEmbedded slides
After class
✅ Practice:
- Count the number of
NAs in each column of a dataset usingsummarize(across(everything(), ~sum(is.na(.x)))) - Use
drop_na()on a specific column vs. the whole dataset — what’s the difference in rows lost? - Try
replace_na()to fill missing values with a sensible default (e.g., 0 for “no response”) - Use
complete()to make implicit missing values explicit in a longitudinal dataset - Think critically: when is it okay to drop missing data? When is it dangerous?
NoteMissing data is information
Missing values aren’t just annoying — they can tell you something. Before dropping or filling NAs, ask:
- Why is this value missing? (Didn’t answer? Wasn’t asked? Data entry error?)
- Is the missingness random? If participants who dropped out were systematically different, dropping them biases your results
- How much is missing? A few values vs. 40% of a column require different strategies
Document your decisions about missing data — your future self (and reviewers) will thank you.
Counting missing values
# Quick summary of missingness across all columns
data |>
summarize(
across(everything(), ~sum(is.na(.x)))
) |>
pivot_longer(everything(),
names_to = "variable",
values_to = "n_missing"
) |>
arrange(desc(n_missing))