13: Strings, Factors & Text
Content for Monday, May 11, 2026
Before class
📖 Readings:
- R4DS Ch 14: Strings (sections 14.1–14.3 only)
- R4DS Ch 16: Factors
ImportantAssignment 5 is due today
Assignment 5: Exploratory Data Analysis — due Sunday, May 10 at 11:59 PM.
During class
We’ll cover:
- Strings — creating, combining, and basic cleaning
str_to_lower(),str_to_upper(),str_trim()for cleanupstr_detect()andstr_replace()for simple pattern matching- Factors — categorical data with a fixed set of levels
- Why factor order matters for plots and tables
- Reordering factors:
fct_relevel(),fct_reorder(),fct_infreq() - Recoding factors:
fct_recode(),fct_collapse() - Practical application: cleaning demographic variables
TipAssignment 6 is assigned today
Assignment 6: Data Types & Wrangling — due Sunday, May 17 at 11:59 PM.
Slides
View slides in new tab Download PDFEmbedded slides
After class
✅ Practice:
- Clean up a messy text column using
str_trim()andstr_to_lower()— try it on free-response demographic data - Use
str_detect()to filter rows where a column contains a specific word - Convert a character column to a factor with
factor(). What happens to the levels? - Reorder bars in a bar chart using
fct_infreq()— most common category first - Use
fct_recode()to combine similar categories (e.g., “Male” and “male” and “M”)
NoteWhy factors matter for plots
By default, R orders categories alphabetically. That’s rarely what you want in a plot. Factors let you control the order:
# Alphabetical (default) — not great
ggplot(data, aes(x = condition)) + geom_bar()
# Ordered by frequency — much better
ggplot(data, aes(x = fct_infreq(condition))) + geom_bar()
# Custom order — you decide
data |>
mutate(condition = fct_relevel(condition, "Control", "Low", "High")) |>
ggplot(aes(x = condition)) + geom_bar()Psychology application: cleaning demographics
# Common cleanup pipeline for survey demographics
survey |>
mutate(
gender = str_to_lower(str_trim(gender)),
gender = fct_recode(as_factor(gender),
"Man" = "male",
"Man" = "m",
"Woman" = "female",
"Woman" = "f"
),
education = fct_relevel(education,
"High school", "Some college", "Bachelor's", "Graduate"
)
)