5: Data Tidying

Content for Monday, April 13, 2026

Before class

📖 Reading:

ImportantAssignment 2 is due today

Assignment 2: Data Transformation — due Sunday, April 12 at 11:59 PM.

During class

We’ll cover:

  • What is tidy data? The three rules
  • Wide vs. long format
  • pivot_longer() — wide to long
  • pivot_wider() — long to wide
  • Extracting information from column names
  • separate_*() and unite() for splitting/combining columns

Slides

View slides in new tab Download PDF

Embedded slides

After class

Practice:

  1. Take this wide dataset and make it tidy:
wide_data <- tibble(
  participant = c("P1", "P2", "P3"),
  pre_test = c(45, 52, 48),
  post_test = c(62, 58, 71)
)
  1. Practice with survey data — pivot questionnaire items from columns to rows
  2. Create a summary table and use pivot_wider() to format it nicely
  3. Use separate_wider_delim() to split a column like “age_gender” into separate columns
ImportantThe three rules of tidy data
  1. Each variable is a column
  2. Each observation is a row
  3. Each value is a cell

When your data follows these rules, tidyverse tools (ggplot2, dplyr) work seamlessly!

Common patterns in psychology data

Repeated measures:

# Wide (common in SPSS)
data_wide <- tibble(
  id = 1:3,
  time1 = c(50, 55, 48),
  time2 = c(55, 58, 52)
)

# Tidy version
data_wide |>
  pivot_longer(
    cols = starts_with("time"),
    names_to = "time",
    values_to = "score"
  )

Questionnaire items:

# Calculate scale scores from long format
survey |>
  pivot_longer(cols = starts_with("bdi_")) |>
  group_by(participant) |>
  summarize(bdi_total = sum(value))