forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
70 lines (62 loc) · 2.3 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
title: 'Reproducible Research: Peer Assessment 1'
output:
html_document:
keep_md: yes
pdf_document: default
---
```{r loading libraries, echo=FALSE, results='hide'}
library(ggplot2)
library(lattice)
```
## Loading and preprocessing the data
```{r}
tmp <- tempfile()
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Factivity.zip",tmp)
unzip(tmp)
unlink(tmp)
data<-read.csv("activity.csv")
```
## What is mean total number of steps taken per day?
Here is the histogram of the total number of steps taken each day
```{r}
sbd<-aggregate(steps~date,data,sum)
qplot(sbd$steps, geom="histogram", xlab="Number of steps", ylab="Frequency")
```
```{r mean and median, echo=FALSE}
smean<-mean(sbd$steps)
smedian<-median(sbd$steps)
```
Where the mean is `r smean` and median is `r smedian`.
## What is the average daily activity pattern?
```{r}
sbi <- aggregate(steps ~ interval, data, mean)
qplot(sbi$interval,sbi$steps,geom = "line",xlab ="interval", ylab="Number of steps")
```
```{r max, echo=FALSE}
imax<-sbi[which.max(sbi$steps),1]
```
The 5-min interval which contains the maximum number of steps on average is `r imax`.
## Imputing missing values
```{r}
num_nas<-sum(is.na(data))
imputed<-transform(data, steps = ifelse(is.na(data$steps), sbi$steps[match(data$interval, sbi$interval)], data$steps))
```
The number of missing values in the data set is `r num_nas`. This dataset was imputed by changing the NAs with the average for each time interval.
```{r}
sbdi<-aggregate(steps~date,imputed,sum)
qplot(sbdi$steps, geom="histogram", xlab="Number of steps", ylab="Frequency")
imean<-mean(sbdi$steps)
imedian<-median(sbdi$steps)
diff_mean<-abs(smean-imean)
diff_median<-abs(smedian-imedian)
```
The mean of the imputed data is `r imean` and the median `r imedian`.
The difference of the mean between the original date and the imputed data is `r diff_mean` and the median `r diff_median`.
## Are there differences in activity patterns between weekdays and weekends?
```{r}
weekdays <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
imputed$weekdays = as.factor(ifelse(is.element(weekdays(as.Date(imputed$date)),weekdays),"weekday", "weekend"))
sbii<- aggregate(steps ~ interval + weekdays, imputed, mean)
xyplot(sbii$steps ~ sbii$interval|sbii$weekdays,xlab="Interval", ylab="Steps",layout=c(1,2), type="l")
```