4: Data Transformation II
Content for Wednesday, April 8, 2026
Before class
📖 Readings:
TipAssignment 2 is assigned today
Assignment 2: Data Transformation — due Sunday, April 12 at 11:59 PM.
During class
We’ll cover:
group_by()— define groups for operationssummarize()— calculate summary statistics- Combining
group_by()+summarize()for grouped statistics count()— a quick shortcut for counting- Code style best practices
- Building complete analysis pipelines
Slides
View slides in new tab Download PDFEmbedded slides
After class
✅ Practice:
Using the flights dataset:
- Calculate the average departure delay by carrier
- Find which carrier has the most flights
- Calculate the percentage of flights that were delayed (arr_delay > 0) by month
- Find the top 5 destinations with the highest average arrival delay
- Create a complete pipeline that filters, groups, summarizes, and arranges
NoteThe power of group_by()
group_by() doesn’t change how your data looks — it changes how other verbs work on it. Think of it as setting up the “behind the scenes” grouping structure.
data |>
group_by(condition) |>
summarize(mean = mean(score))Psychology application
# Calculate descriptive statistics by condition
experiment_data |>
group_by(condition) |>
summarize(
M = mean(score, na.rm = TRUE),
SD = sd(score, na.rm = TRUE),
n = n()
)