2 + 2PSY 410: Data Science for Psychology
2026-03-30
In 2015, a team of 270 researchers tried to replicate 100 published psychology studies.
Only 36% produced the same results.
This wasn’t fraud. These were real labs, following published methods, using real data.
So what went wrong?
The replication crisis isn’t just a statistics problem. It’s a workflow problem.
Every technical skill we learn this quarter serves the same goal:
Start with raw data. End with a clear, honest, reproducible story.
That means learning to:
What we’ll learn:
Why it matters:

Dr. Sara Weston

Amala Someshwar
| Component | Weight | What it is |
|---|---|---|
| Assignments | 35% | 8 weekly coding exercises |
| Reading quizzes | 15% | 10 weekly quizzes — unlimited retakes, best score counts |
| Participation | 15% | In-class pair coding submissions |
| Final project | 35% | Analyze a dataset of your choosing |
8 assignments total. Late submissions accepted up to 48 hours with a 10% daily penalty. Plus one free “life happens” extension — email me within a week of the due date to use it.
Each week, a brief Canvas quiz on that week’s readings. Due Sunday at 11:59 PM.
Each quiz pulls 5 random questions from a larger bank. You can retake as many times as you want — your best score counts.
10 quizzes total (one per week). Same late policy as assignments.
Every class, you’ll work with a partner on a coding exercise.
A capstone project where you apply everything to a dataset you choose.
| Milestone | When |
|---|---|
| Proposal | Week 5 — dataset and research questions |
| Draft | Week 8 — working code and preliminary results |
| Final report | Week 10 — polished Quarto document |
| Presentation | Finals week — 5-minute share-out |
At the start of the term, you’ll be placed on a team of 5–6 students.
Throughout the quarter, your team earns points:
The team challenge does not affect your grade.
The winning team at the end of the term earns a celebration.
We’ll form teams based on a short survey you’ll take today. Teams will be announced on Wednesday.
Point-and-click (SPSS, Excel)
Code-based (R)
| What you need | R delivers |
|---|---|
| Free | Open source — no licenses, ever |
| Built for data | Created by statisticians, not software engineers |
| Psychology packages | psych, lavaan, lme4, brms |
| Publication-quality figures | ggplot2 — industry-leading |
| Reproducibility | Quarto integration (we’ll learn this) |
Important
Install R first, then RStudio. RStudio needs R to work!
Think of it like this:
You could drive with just an engine… but why would you?
| Pane | What it does |
|---|---|
| Source (top-left) | Write and edit your code files |
| Console (bottom-left) | Run code interactively, see output |
| Environment (top-right) | See your data and objects |
| Files/Plots/Help (bottom-right) | Navigate files, view plots, get help |
Type in the Console and press Enter:
In R, we use <- to assign values to objects:
Think of it as an arrow pointing left: “put 10 into x”
Tip
Keyboard shortcut: Alt + - (Windows) or Option + - (Mac)


Every piece of paper has a place where it lives.
You navigate to that place to find it.
Documents/
├── psy410/
│ ├── data/
│ └── scripts/
└── thesis/
├── data/
└── drafts/
Everything is in one pile.
You search to find it.
Google, Spotlight, Finder search, “Recent files”…
When you write:
You’re giving directions: “Start here. Go into data. Then into raw. Find survey_responses.csv.”
If the file isn’t exactly there, the code breaks. No fuzzy matching. No “did you mean…?”
Important
This is why we need to learn directory structure — even if it feels foreign.
Without projects:
setwd() nightmaresWith projects:
Tip
The golden rule: Someone else should be able to run your code and get the same results. That “someone else” includes Future You — who has forgotten everything.
You’ll see a .Rproj file appear — this is your project file. From now on, double-click it to open your project.
psy410/
├── psy410.Rproj
├── data/
│ ├── raw/ # Original data (READ ONLY)
│ └── clean/ # Processed data
├── scripts/
│ ├── 01_clean.R # Data cleaning
│ ├── 02_analyze.R # Main analysis
│ └── 03_visualize.R # Figures
└── output/
└── figures/ # Saved plots
Two principles:
Good names make code self-documenting. Bad names create confusion and bugs.
Do this:
reaction_timemean_anxietysurvey_clean.csv01_clean_data.RNot this:
x1, temp, fooAvgAnx (hard to read)data_final_v2_REAL.csvstuff.R, untitled3.RThe test: Can someone unfamiliar with your project understand what a variable contains or what a file does?
R scripts (.R files) are where you write and save your code.
To create one:
Always save your scripts! The console history disappears.
Tip
You’ll use Ctrl/Cmd + Enter constantly. Memorize it!
R’s power comes from packages — bundles of functions others have written.
The tidyverse is actually a collection of packages:
| Package | Purpose |
|---|---|
ggplot2 |
Data visualization |
dplyr |
Data manipulation |
tidyr |
Data tidying |
readr |
Reading data files |
tibble |
Modern data frames |
stringr |
String manipulation |
forcats |
Working with factors |
purrr |
Functional programming |
We’ll use most of these throughout the course.
After loading tidyverse, you have access to built-in datasets:
# A tibble: 234 × 11
manufacturer model displ year cyl trans drv cty hwy fl class
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
3 audi a4 2 2008 4 manu… f 20 31 p comp…
4 audi a4 2 2008 4 auto… f 21 30 p comp…
5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
# ℹ 224 more rows
data, scripts, outputscripts/ and save it as 01_practice.Rglimpse(mpg) and glimpse(diamonds)? — e.g., ?mean opens the help pageHelp pages include:
The examples section is gold — run them!
Errors will happen. A lot. That’s normal.
Error in maen(c(1, 2, 3)): could not find function "maen"
Read the error message carefully — R is trying to help you.
Steve Jobs called the computer a “bicycle for the mind” — it amplifies your pedaling, but you provide the balance and direction.
Generative AI is more like a motorcycle. It provides the engine. But if you don’t know how to ride, you crash faster and harder.
Read Cat Hicks: “Cognitive helmets for AI bicycles”
AI gives you a working answer immediately. It feels like you solved it.
But you didn’t build the mental model of why it works.
When the AI hallucinates (which it will), you cannot debug it. You are stranded.
Using AI now = skipping the gym but expecting to get strong.
Before we get on the motorcycle, we need a cognitive helmet:
That’s what the many short assignments in this course build. The goal: you are the pilot, not the passenger.
Learning R isn’t just about R. It’s about a set of habits that transfer everywhere:
You may never write another line of R after June. That’s okay. The thinking you build here will show up everywhere: in a lab, in a dissertation, in any job that asks you to solve a messy problem.
Many data science and research jobs include a technical interview — you’re given a dataset and asked to write code live, on the spot.
No AI. No Stack Overflow. No notes. Just you and a blank script.
You don’t need to be ready for that by June. But every time you practice writing code from memory — instead of copying or prompting — you’re building the confidence and fluency that will matter when it counts.
Start now, so you’re not starting from zero later.
Read:
Do:
data/, scripts/, and output/ foldersEvery skill you learn in this course is a brick in a wall between you and the replication crisis.
See you Wednesday for your first visualization!
PSY 410 | Session 1
Comments
Use
#to write comments — notes for yourself and others:Comments are essential. Your future self will thank you.