forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
90 lines (70 loc) · 2.6 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# Reproducible Research: Peer Assessment 1
========================================================
## Loading and preprocessing the data
```{r}
setwd("~/")
activity=read.csv("activity.csv")
activity2=na.omit(activity)
```
## What is the mean total number of steps taken per day?
Histogram of the total number of steps each day
```{r}
library(plyr)
#Sum the number of steps per day
stepsday=ddply(activity2, "date", summarize, steps=sum(steps))
#Plot the histogram
hist(stepsday$steps, xlab="Steps/Day", main="")
```
Mean and Median Steps per day
```{r}
mean(stepsday$steps)
median(stepsday$steps)
```
## What is the average daily activity pattern?
Time series plot of the 5 minute intervals and average number of steps per day, averaged across all days.
```{r}
#Make a data frame with the mean steps over the intervals
stepsday=ddply(activity2, "interval", summarize,steps=mean(steps))
#Time series plot of intervals and mean steps per day
plot(stepsday$interval,stepsday$steps, type="l", xlab="Interval", ylab="Mean Steps/Day", main="")
```
Which interval has the maximum number of steps?
```{r}
stepsday$interval[max(stepsday$steps)]
```
## Imputing missing values
Calculate and report the total number of rows with NAs
```{r}
nrow(activity[!complete.cases(activity),])
```
Create a new dataset with missing values filled in.
```{r}
library(mice)
#Imputed values using the mice package
activity2=mice(activity, printFlag=F)
```
Histogram of the total number of steps each day and calculate the mean and median.
```{r}
#Make a data frame with the sum of all steps for each day
stepsday=ddply(complete(activity2), "date", summarize, steps=sum(steps))
#Make the histogram
hist(stepsday$steps, xlab="Steps per Day", main="")
mean(stepsday$steps)
median(stepsday$steps)
```
## Are there differences in activity patterns between weekdays and weekends?
Create a new factor variable in the dataset with two levels weekday and weekend.
```{r}
activity2=complete(activity2)
activity2$date=as.Date(activity2$date)
activity2$date=weekdays(activity2$date)
activity2$weekend=ifelse(activity2$date=="Sunday"|activity2$date=="Saturday","Weekend","Weekday")
```
Time series, panel plot of 5 minute interval and average number of steps taken averaged across weekends and weekdays.
```{r}
#Make a data frame with the average steps per Weekday and Weekend day
stepsday=aggregate(activity2$steps, by=list(activity2$interval, activity2$weekend), FUN=mean)
library(ggplot2)
#Plot the average steps weekend vs weekday
qplot(Group.1, x, data=stepsday, facets=.~Group.2, xlab="Interval", ylab="Mean Steps/Day",geom=c("line", "smooth"), method="", se=FALSE)
```