5: Data Tidying

Content for Monday, April 13, 2026

Before class

📖 Reading:

R4DS Ch 5: Data tidying

Assignment 2 is due today

Assignment 2: Data Transformation — due Sunday, April 12 at 11:59 PM.

During class

We’ll cover:

What is tidy data? The three rules
Wide vs. long format
pivot_longer() — wide to long
pivot_wider() — long to wide
Extracting information from column names
separate_*() and unite() for splitting/combining columns

Slides

View slides in new tab Download PDF

Embedded slides

After class

✅ Practice:

Take this wide dataset and make it tidy:

wide_data <- tibble(
  participant = c("P1", "P2", "P3"),
  pre_test = c(45, 52, 48),
  post_test = c(62, 58, 71)
)

Practice with survey data — pivot questionnaire items from columns to rows
Create a summary table and use pivot_wider() to format it nicely
Use separate_wider_delim() to split a column like “age_gender” into separate columns

The three rules of tidy data

Each variable is a column
Each observation is a row
Each value is a cell

When your data follows these rules, tidyverse tools (ggplot2, dplyr) work seamlessly!

Common patterns in psychology data

Repeated measures:

# Wide (common in SPSS)
data_wide <- tibble(
  id = 1:3,
  time1 = c(50, 55, 48),
  time2 = c(55, 58, 52)
)

# Tidy version
data_wide |>
  pivot_longer(
    cols = starts_with("time"),
    names_to = "time",
    values_to = "score"
  )

Questionnaire items:

# Calculate scale scores from long format
survey |>
  pivot_longer(cols = starts_with("bdi_")) |>
  group_by(participant) |>
  summarize(bdi_total = sum(value))