14: Joins
Content for Wednesday, May 13, 2026
Before class
📖 Reading:
During class
We’ll cover:
- Why join data? Psychology data often lives in multiple tables
- Keys — primary keys and foreign keys
- Mutating joins (add columns from another table):
left_join()— keep all rows from the left tableinner_join()— keep only matching rowsright_join()andfull_join()
- Filtering joins (use one table to filter another):
semi_join()— keep rows that have a matchanti_join()— keep rows that don’t have a match
- Common issues: many-to-many, missing keys, duplicates
TipAssignment 7 is assigned today
Assignment 7: Joins & Missing Data — due Sunday, May 24 at 11:59 PM.
Slides
View slides in new tab Download PDFEmbedded slides
After class
✅ Practice:
- Create two small tibbles with a shared key column. Practice
left_join(),inner_join(), andfull_join()— what’s different about the output? - Use
anti_join()to find participants who are in one dataset but not another (a common quality check) - Try joining by multiple keys using
by = c("id", "timepoint") - What happens when a key isn’t unique? Try it and see
- Think about your final project: do you need to combine multiple tables?
NoteWhich join do you need?
Most of the time, you want left_join() — it keeps everything in your main table and adds columns from the second table where there’s a match.
| Join | Keeps |
|---|---|
left_join(x, y) |
All rows from x, matching rows from y |
inner_join(x, y) |
Only rows that match in both |
full_join(x, y) |
All rows from both tables |
anti_join(x, y) |
Rows in x with no match in y |
Psychology application: combining data sources
# Participant demographics + survey responses
demographics <- tibble(
participant_id = c("P01", "P02", "P03"),
age = c(19, 22, 20),
gender = c("Woman", "Man", "Woman")
)
survey_scores <- tibble(
participant_id = c("P01", "P02", "P04"),
anxiety_score = c(34, 28, 41)
)
# Keep all participants, add scores where available
demographics |>
left_join(survey_scores, by = "participant_id")
# Find who's missing survey data
demographics |>
anti_join(survey_scores, by = "participant_id")