This repository contains all of the files necessary for an investigation of happiness and altruism in the United States using the NORC's General Social Survey (GSS) data. The aim of this study was to determine if happiness leads to altruistic behavior.
Note
The research paper associated with this study is available here.
This project requires both the R programming language and Quarto. If you do not have these tools in your development environment, please install them now. You will also need an integrated development environment (IDE) capable of running R scripts. I recommend RStudio (local) or Posit Cloud (cloud-based).
Once your environment is set up, you must install several packages that handle various tasks, like graphing data, creating tables, and general organization and processing. You will find a complete list of these packages in the file scripts/00-install_dependencies.r
. You only need to run this file once to install the required dependencies.
Note
A step-by-step guide for how to download this data is available here.
The first step in working with this project is to download following three data sets from the General Social Survey.
Once you download the data from GSS, place the GSS.dat
and GSS.dct
files in inputs/data/raw
and run scripts/01-data_covert.r
to conver the data to a .csv
file.
Before moving to data analysis, we must clean the generated .csv
files to help us filter, use, and understand the relevant data points. The scripts/02-data_cleaning.r
file handles all of the data cleaning, including fixing column names (many have characters that cannot be used or are insufficent descriptors), selecting the appropriate columns, and filtering any rows that contain null data.
Run the file to fetch the raw data sets, clean them, and then create new .csv
files with the clean data. At the end of this process, you should have six new files in inputs/data/clean
:
directions_negative_data.csv
directions_positive_data.csv
happy_negative_data.csv
happy_positive_data.csv
homeless_negative_data.csv
homeless_positive_data.csv
The core data analysis of this project occurs in the outputs/paper/paper.qmd
file, another Quarto document. Once you render paper.qmd
, Quarto will generate a paper.pdf
file in the same folder. The raw references used in paper.qmd
are available under the same folder in the references.bib
file.
If you're experiencing problems with the data, I've compiled a document that tests the data against several parameters, like data types, number ranges, and data ranges. This testing document is available under the scripts/03-data_testing.r
file. The file contains a number of tests to run on the six .csv
files.
Before running these tests, you must first download the data following the steps outlined above. All of these tests should return true. If they do not, feel free to create an issue.
If you'd like to debug the problem yourself, or if you'd like to use a service like Stack Overflow for help, it's important to have some simulated data to reproduce the problem. The scripts/04-data_simulation.r
file generates random, fake data based on the information initially downloaded from GSS.
Created by Sebastian Rodriguez, Laura Lee-Chu, and Iz Leitch © 2023, licensed under the BSD 3-Clause License. Contains information from General Social Survey (GSS), a project of the independent research organization NORC at the University of Chicago, with principal funding from the National Science Foundation. Created using R, an open-source statistical programming language.
This project uses a number of R packages, including: dplyr, ggplot2, here, janitor, kableExtra, knitr, lubridate, opendatatoronto, readr, RColorBrewer, scales, and tidyverse.
Much of this project's development was informed by Rohan Alexander's book Telling Stories with Data.