# The basic pattern
my_data <- read_csv("path/to/file.csv")PSY 410: Data Science for Psychology
2026-04-15
So far we’ve used datasets that were already loaded in R — mpg, flights, bfi.
But your thesis data isn’t sitting inside a package. It’s a CSV on your desktop, an Excel file from your advisor, or a Qualtrics export with 200 columns named Q1, Q2, Q3…
Today we learn how to get it into R — and what to check when things don’t import cleanly.
readr package (loaded with tidyverse)read.csv()read_csv() |
read.csv() |
|
|---|---|---|
| Package | readr (tidyverse) | base R |
| Returns | tibble | data.frame |
| Speed | Much faster | Slower |
| Type guessing | Better | Converts strings to factors |
| Consistent | Yes | Depends on options |
Always use read_csv() in this class.
Types it recognizes: dbl (number), chr (text), lgl (TRUE/FALSE), date, dttm (date-time)
Sometimes R guesses wrong — especially with:
| Shortcut | Type |
|---|---|
"c" |
character |
"d" |
double |
"i" |
integer |
"l" |
logical |
"D" |
date |
"?" |
let R guess |
Real data often has ugly column names:
By default, read_csv() recognizes these as NA:
NA, N/A, NaN, "" (empty), ".", "NULL"
Psychology datasets often use -999 or 99 as missing value codes!
Save intermediate cleaned files — don’t re-clean every time you start R.
I’ve created a messy CSV file for you. Download it here. With a partner:
read_csv() — what types does R guess?-999 and "N/A" should be NAproblems() show you?📤 Upload your code to Canvas for participation credit. Paste what you have into today’s in-class submission — it doesn’t need to work perfectly.
Excel is everywhere in psychology. The readxl package handles it:
library(readxl) # Or just use readxl:: directly
# Read the first sheet
my_data <- read_excel("survey_results.xlsx")
# Read a specific sheet
my_data <- read_excel("survey_results.xlsx", sheet = "Demographics")
# Read a specific range
my_data <- read_excel("survey_results.xlsx",
sheet = "Scores",
range = "A1:F50"
)Things that look fine in Excel but break in R:
range to isolate themTip
The best fix is usually to clean the Excel file first, or use range to read just the clean part.
Useful when you don’t know the file structure yet.
Psychology labs often have data in SPSS (.sav) or SAS formats:
These return tibbles, just like read_csv().
SPSS files often have variable labels and value labels:
The most common workflow in psychology:
Note
The first two rows after the header contain Qualtrics’ own descriptions. skip = 2 gets past them. Always check with glimpse() after importing.
# 1. Read the file
raw_data <- read_csv("qualtrics_export.csv",
skip = 2,
na = c("", "N/A", "-999")
)
# 2. Check what you got
glimpse(raw_data)
problems(raw_data)
# 3. Rename columns
clean_data <- raw_data |>
rename(
id = ResponseId,
age = Q1,
condition = Q2
)
# 4. Fix types if needed
clean_data <- clean_data |>
mutate(
id = as.character(id),
age = as.numeric(age)
)
# 5. Save the cleaned version
write_csv(clean_data, "data/clean_survey.csv")Try this mini-challenge before next class:
# 1. Import the messy CSV from the pair exercise
messy <- read_csv("data/messy_survey.csv", na = c("", "NA", "-999", "N/A"))
# 2. Check for problems
problems(messy)
# 3. Fix at least one column type issue with col_types or
# readr::parse_number()
# 4. Save your cleaned version
write_csv(clean, "data/clean_survey.csv")Assignment 3 will have you do this end to end with real messy data.
| Function | What it does |
|---|---|
read_csv() |
Read CSV files |
write_csv() |
Save CSV files |
read_excel() |
Read Excel files |
read_sav() |
Read SPSS files |
problems() |
Check for import issues |
col_types |
Control column types |
📖 Read:
✅ Practice:
problems() — does it flag anything?read_csv() is your default — it’s faster and smarter than read.csv()glimpse() and problems() are your friends-999readxl and be suspicious of formattingYour analysis is only as good as your import. Check it before you trust it.
Next time: Layers & Aesthetics
PSY 410 | Session 6