3: Data Transformation I

Content for Monday, April 6, 2026

Before class

📖 Reading:

ImportantAssignment 1 is due today

Assignment 1: Getting Started — due Sunday, April 5 at 11:59 PM.

During class

We’ll cover:

  • Introduction to dplyr
  • filter() — pick rows by their values
  • arrange() — reorder rows
  • select() — pick columns by name
  • mutate() — create new columns
  • The pipe operator (|>)

Slides

View slides in new tab Download PDF

Embedded slides

After class

Practice:

Using the flights dataset from the nycflights13 package:

  1. Filter to flights departing in December
  2. Find all flights to Los Angeles (LAX)
  3. Create a new variable for flight speed (distance / air_time * 60)
  4. Select only the carrier, origin, destination, and your new speed variable
  5. Arrange by speed to find the fastest flights
TipKeyboard shortcut

The pipe (|>) is so common that there’s a keyboard shortcut:

  • Windows/Linux: Ctrl + Shift + M
  • Mac: Cmd + Shift + M

Installing the flights dataset

install.packages("nycflights13")
library(nycflights13)