Assignment 5: Exploratory Data Analysis

Due by 11:59 PM on Sunday, May 10, 2026

NoteAssignment Details

Assigned: Monday, May 4 (Session 11) Due: Sunday, May 10 at 11:59 PM Submit: Quarto document (.qmd) AND rendered HTML on Canvas

TipGetting started

Overview

This assignment practices exploratory data analysis (EDA). You’ll explore distributions, identify outliers, investigate relationships, and generate questions from data. This is also good practice for your final project!

Setup

# Assignment 5: Exploratory Data Analysis
# Your Name
# Date

library(tidyverse)
library(psych)  # For the bfi dataset

# If you haven't installed psych:
# install.packages("psych")

The Dataset

We’ll use the Big Five Inventory (BFI) dataset from the psych package:

data(bfi)
bfi <- as_tibble(bfi)
glimpse(bfi)

This contains responses from 2,800 participants on 25 personality items (5 per Big Five trait: Agreeableness, Conscientiousness, Extraversion, Neuroticism, Openness), plus age, gender, and education.

Part 1: Exploring Distributions (25 points)

Task 1.1

Create histograms for age and at least one personality item. What do you notice about the distributions?

Task 1.2

Are there any outliers or unusual values in age? How would you identify them programmatically? (Hint: consider values beyond 3 standard deviations, or use filter() with reasonable bounds)

Task 1.3

How much missing data is there? Create a summary showing the percentage of NA values for each variable.

Part 2: Exploring Relationships (30 points)

Task 2.1

Create a scatterplot of two personality items that you hypothesize might be related. Add a trend line. Was your hypothesis supported?

Task 2.2

Create boxplots showing how one personality item differs by gender. Do you see a difference?

Task 2.3

Create a correlation matrix visualization for the 5 Extraversion items (E1-E5).

Hint:

bfi |>
  select(E1:E5) |>
  cor(use = "complete.obs") |>
  # then visualize...

What do you notice? (Some items may be reverse-coded!)

Part 3: Generating Questions (20 points)

Task 3.1

Based on your exploration, generate 3 interesting questions that could be investigated with this data. Write them as comments.

Task 3.2

For one of your questions, create a visualization that helps answer it. Explain your findings in a comment.

Part 4: EDA Report (15 points)

Write a 1-page (approximately 300-400 words) EDA report summarizing:

  1. The data: Brief description of the dataset
  2. Key findings: 2-3 interesting patterns you discovered
  3. Data quality issues: Missing data, outliers, or concerns
  4. Next steps: What would you investigate further?

Write this as narrative text (not code comments) in your Quarto document.

Grading Rubric

Component Points
Part 1: Exploring distributions 25
Part 2: Exploring relationships 30
Part 3: Generating questions 20
Part 4: EDA report 15
Code runs without errors 10
Total 100

Submission

Submit:


NotePSY 510 (Graduate Students)

Students enrolled in PSY 510 must complete the following extension in place of the 1-page EDA report in Part 4.

Graduate Extension: Preliminary Analysis Write-Up

Instead of the 1-page EDA summary, write your findings as if they were the preliminary analysis section of a manuscript — combining your code output and interpretations in a form that would make sense to a reader of a journal article.

Your write-up should include the following, in order:

  1. Dataset description (2–3 sentences): What are the data? How many participants? What variables? How was it collected?

  2. Data quality and exclusions (1–2 sentences): Were any observations flagged or removed? What did you do about missing data? Write this as you would in a Methods section — specific and justified.

  3. Descriptive statistics: Report means, SDs, and ranges for at least 5 key variables. Format these as a table using knitr::kable() in a Quarto document, or present them clearly in prose if submitting a plain PDF.

  4. Key patterns (1–2 paragraphs): Describe 2–3 findings from your EDA, referring to your figures by number (Figure 1 shows…). Write this as you would in a Results section — precise, direct, no over-interpretation.

  5. Analysis plan (1 paragraph): Based on what you found, what would you examine next in a formal analysis? Why does the EDA support that direction?

Length: 500–700 words. This should read as a coherent piece of writing, not a list of answers.

Submission: Include your write-up as narrative text in your .qmd file under a clearly marked ## Graduate Extension section.