5: Data Tidying
Content for Monday, April 13, 2026
Before class
📖 Reading:
ImportantAssignment 2 is due today
Assignment 2: Data Transformation — due Sunday, April 12 at 11:59 PM.
During class
We’ll cover:
- What is tidy data? The three rules
- Wide vs. long format
pivot_longer()— wide to longpivot_wider()— long to wide- Extracting information from column names
separate_*()andunite()for splitting/combining columns
Slides
View slides in new tab Download PDFEmbedded slides
After class
✅ Practice:
- Take this wide dataset and make it tidy:
wide_data <- tibble(
participant = c("P1", "P2", "P3"),
pre_test = c(45, 52, 48),
post_test = c(62, 58, 71)
)- Practice with survey data — pivot questionnaire items from columns to rows
- Create a summary table and use
pivot_wider()to format it nicely - Use
separate_wider_delim()to split a column like “age_gender” into separate columns
ImportantThe three rules of tidy data
- Each variable is a column
- Each observation is a row
- Each value is a cell
When your data follows these rules, tidyverse tools (ggplot2, dplyr) work seamlessly!
Common patterns in psychology data
Repeated measures:
# Wide (common in SPSS)
data_wide <- tibble(
id = 1:3,
time1 = c(50, 55, 48),
time2 = c(55, 58, 52)
)
# Tidy version
data_wide |>
pivot_longer(
cols = starts_with("time"),
names_to = "time",
values_to = "score"
)Questionnaire items:
# Calculate scale scores from long format
survey |>
pivot_longer(cols = starts_with("bdi_")) |>
group_by(participant) |>
summarize(bdi_total = sum(value))