Assignment 5: Exploratory Data Analysis
Due by 11:59 PM on Sunday, May 10, 2026
Assigned: Monday, May 4 (Session 11) Due: Sunday, May 10 at 11:59 PM Submit: Quarto document (.qmd) AND rendered HTML on Canvas
See the guides: Setting Up an R Project | Using Quarto Documents
Overview
This assignment practices exploratory data analysis (EDA). You’ll explore distributions, identify outliers, investigate relationships, and generate questions from data. This is also good practice for your final project!
Setup
# Assignment 5: Exploratory Data Analysis
# Your Name
# Date
library(tidyverse)
library(psych) # For the bfi dataset
# If you haven't installed psych:
# install.packages("psych")The Dataset
We’ll use the Big Five Inventory (BFI) dataset from the psych package:
data(bfi)
bfi <- as_tibble(bfi)
glimpse(bfi)This contains responses from 2,800 participants on 25 personality items (5 per Big Five trait: Agreeableness, Conscientiousness, Extraversion, Neuroticism, Openness), plus age, gender, and education.
Part 1: Exploring Distributions (25 points)
Task 1.1
Create histograms for age and at least one personality item. What do you notice about the distributions?
Task 1.2
Are there any outliers or unusual values in age? How would you identify them programmatically? (Hint: consider values beyond 3 standard deviations, or use filter() with reasonable bounds)
Task 1.3
How much missing data is there? Create a summary showing the percentage of NA values for each variable.
Part 2: Exploring Relationships (30 points)
Task 2.1
Create a scatterplot of two personality items that you hypothesize might be related. Add a trend line. Was your hypothesis supported?
Task 2.2
Create boxplots showing how one personality item differs by gender. Do you see a difference?
Task 2.3
Create a correlation matrix visualization for the 5 Extraversion items (E1-E5).
Hint:
bfi |>
select(E1:E5) |>
cor(use = "complete.obs") |>
# then visualize...What do you notice? (Some items may be reverse-coded!)
Part 3: Generating Questions (20 points)
Task 3.1
Based on your exploration, generate 3 interesting questions that could be investigated with this data. Write them as comments.
Task 3.2
For one of your questions, create a visualization that helps answer it. Explain your findings in a comment.
Part 4: EDA Report (15 points)
Write a 1-page (approximately 300-400 words) EDA report summarizing:
- The data: Brief description of the dataset
- Key findings: 2-3 interesting patterns you discovered
- Data quality issues: Missing data, outliers, or concerns
- Next steps: What would you investigate further?
Write this as narrative text (not code comments) in your Quarto document.
Grading Rubric
| Component | Points |
|---|---|
| Part 1: Exploring distributions | 25 |
| Part 2: Exploring relationships | 30 |
| Part 3: Generating questions | 20 |
| Part 4: EDA report | 15 |
| Code runs without errors | 10 |
| Total | 100 |
Submission
Submit:
Students enrolled in PSY 510 must complete the following extension in place of the 1-page EDA report in Part 4.
Graduate Extension: Preliminary Analysis Write-Up
Instead of the 1-page EDA summary, write your findings as if they were the preliminary analysis section of a manuscript — combining your code output and interpretations in a form that would make sense to a reader of a journal article.
Your write-up should include the following, in order:
Dataset description (2–3 sentences): What are the data? How many participants? What variables? How was it collected?
Data quality and exclusions (1–2 sentences): Were any observations flagged or removed? What did you do about missing data? Write this as you would in a Methods section — specific and justified.
Descriptive statistics: Report means, SDs, and ranges for at least 5 key variables. Format these as a table using
knitr::kable()in a Quarto document, or present them clearly in prose if submitting a plain PDF.Key patterns (1–2 paragraphs): Describe 2–3 findings from your EDA, referring to your figures by number (Figure 1 shows…). Write this as you would in a Results section — precise, direct, no over-interpretation.
Analysis plan (1 paragraph): Based on what you found, what would you examine next in a formal analysis? Why does the EDA support that direction?
Length: 500–700 words. This should read as a coherent piece of writing, not a list of answers.
Submission: Include your write-up as narrative text in your .qmd file under a clearly marked ## Graduate Extension section.