Insights into health and behavior using data from the CDC (Centers for Diseases Control and Prevention, the U.S federal agency)
The CDC's Behavioral Risk Factor Surveillance System (BRFSS) provides a wealth of information about health and health-related behaviors in the United States. It is the largest and longest running health survey system in the world, and in its current incarnation, it covers over 400,000 adult interviews from all 50 states, the District of Columbia, and three territories. For more information about the survey itself, you should check out the CDC BRFSS site. The BRFSS is a rich source of information on how demographics, behaviors, and other risk factors can correlate with health. Many important population health studies and measures use the BRFSS as a key data source.
- Reviewing survey data and documentation for year 2014. Understanding variables and naming convention.
- Data Preparation. Generating an analytic dataset
- Data Exploration
- How to define a hypothesis
- Selecting hypothesis. Then, preparing, developing, and finalizing both a linear regression model and a logistic regression model to test those hypothesis.
- Interpreting diagnostic plots
- The notes, methodology and images are from this course: healthcare-analytics-regression-in-R
- EDA.ipynb
- HypothesisTesting - LinearRegression.ipynb
- HypothesisTesting - LogisticRegression.ipynb