Required Text: Wickham, H., & Grolemund, G. (2016). R for data science
Required Softwares: R, RStudio Desktop, Git (Windows | Mac)
Sign up to the Slack channel
This course used to be a web development online class that the ISDS department changed into an introduction to R. Because changing a course name is a rather laborious process we kept the old name. If you are looking for the old web development class, please register to the online section of ISDS3105.
Since its first release in 1995, R functionalities have been extended well beyond those of a statistical software, leading it to become one of the most popular software environments for data analysis. This course will prepare students to manage a data analysis project using R and its Integrated Development Environment (IDE) RStudio. Students will gain familiarity with the most popular R libraries to streamline the data analysis workflow: data gathering and retrieval, dataset wrangling and manipulation, and effective presentation of the results. In particular, our focus will be on developing applications for online interactive reporting (e.g., dashboards and interactive reports).
Upon successful completion of this course you will be able to:
- Effective project management using IDE resources (projects) and version control (Git/GitHub)
- Understand the fundamentals of R programming
- Web scraping (using rvest)
- Dataset wrangling and manipulation (using dplyr/tidyr)
- Chart design and creation for data visualization (using ggplot2)
- Efficient and effective results presentation using dynamic and interactive reporting techniques (RMardkown)
- Querying remote relational databases (MySQL) and other source systems (using dplyr)
- T-test and regression (using ggplot2, plotly, infer)
The course is designed for beginners and we expect no prior knowledge of R nor of any other programming language. However, the class builds incrementally on prior content presented in class, thus we recommend that you come "ready to play" from day 1. Prior knowledge of database, web programming, basic stats or other programming languages are a plus but are not critical to succeed in the course.
Learning R is like a contact sport – the more you practice, the better you become at it. Attendance is a good way to push yourself to code every week, and I strongly encourage it although it is not compulsory: Besides few glorious exceptions, there is generally a strong correlation bewteen low attendance and low performance.
We adopt the standard LSU +/- grading scale without any forced curve. The breakdown of the final grade is:
Mid term 30%: The first exam will focus on RMarkdown, data visualization (ggplot2), dataset normalization (tidyr), and data manipulation using dplyr (calculating descriptive statistics).
Group project 20%: The group project is a data analysis project to assess your ability to import, transform, manipulate, visualize, and analyze data. The final output will be a report (interactive or static) to enhance the understanding of a research question.
Final Exam 25%: The final exam will be comprehensive and will focus on both practical skills and theoretical aspects.
Assignments 20%: Students are required to submit assignments approximately (no less than four). Assignments will strictly cover topics discussed in class, and are crucial to interiorize the material. Assignments must be uploaded to your private GitHub repository, and a link to the file must be submitted via Moodle.
Professionalism 5%: We will maintain a high standard of professionalism at all times. Beyond the obvious, such as not disrupting the class by being late, navigating the Web with your computer during class, talking to the people nearby, etc. – professional conduct includes the ability to be a value adding contributor to our learning community.
Date | Topic | Assignment Due | Readings |
---|---|---|---|
Tuesday, August 21 | Introduction and set-up | ||
Thursday, August 23 | Git/GitHub | Assignment1 (install RStudio, git/github) | Do you have a moment to talk about version control? |
Tuesday, August 28 | Base R - data structures 1 | ||
Thursday, August 30 | Base R - functions | ||
Tuesday, September 4 | Base R - data structures 2 | Assignment2 | |
Thursday, September 6 | DataViz ggplot2 | ||
Tuesday, September 11 | DataViz ggplot2 | ||
Thursday, September 13 | Tidy data | ||
Tuesday, September 18 | dplyr | ||
Thursday, September 20 | dplyr - connecting and querying DB | Assignment3 | |
Tuesday, September 25 | dplyr - connecting and querying DB | ||
Thursday, September 27 | Mid-term | ||
Tuesday, October 2 | Geospatial Viz | ||
Fall Holiday | |||
Tuesday, October 9 | Manipulating Dates | ||
Thursday, October 11 | Open Data API | Assignment4 GeoViz | |
Tuesday, October 16 | OpenData | ||
Thursday, October 18 | Iteration with purrr | ||
Tuesday, October 23 | WebScraping | ||
Thursday, October 25 | Regression | Assignment5 Assingnment on OpenData Map | |
Tuesday, October 30 | Parametrized Reports | ||
Thursday, November 1 | Dashoboards with flexdashboard | ||
Thursday, November 8 | Dashoboards with flexdashboard | ||
Tuesday, November 13 | Supervised lab - group project | ||
Thursday, November 15 | Supervised lab - group project | ||
Tuesday, November 20 | Case-study - online reviews | Assignment6 | |
Thanksgiving | |||
Tuesday, November 27 | Presentations | ||
Thursday, November 29 | Presentations | ||
Wednesday, December 5 | Final exam 5.30-7.30 PM |
ModernDiver: An Introduction to Statistical and Data Sciences via R