This repository has been archived by the owner on Feb 13, 2023. It is now read-only.
forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
PA1_template.Rmd
129 lines (105 loc) · 4.82 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
title: 'Reproducible Research: Peer Assessment 1'
output:
html_document:
keep_md: yes
pdf_document: default
---
## Loading and preprocessing the data
Extract the activity.cvs file from activity.zip,
created data.frame from the csv file and wrap the data frame into a data table
```{r loaddata}
options(scipen = 1, digits = 2)
library(data.table)
library(dplyr)
activity <- data.table(read.csv(unz("activity.zip","activity.csv"))) %>%
mutate(fulldate = strptime(date, format="%Y-%m-%d") + (interval %/% 100) * 60 + interval %% 100)
```
## What is mean total number of steps taken per day?
Group the data by day, and summerize the steps by date.
For dates that do not have steps, the number of steps is set to zero
```{r results="asis"}
library(xtable)
number.of.steps.by.day <- activity %>%
group_by(date) %>%
summarise_each(funs(sum(., na.rm = TRUE)), steps)
```
In the historgram the number of steps per day are show
```{r}
barplot( number.of.steps.by.day$steps, names.arg=number.of.steps.by.day$date, ylab="Number of steps per day")
```
```{r}
mean.steps <- mean(number.of.steps.by.day$steps)
median.steps <- median(number.of.steps.by.day$steps)
```
The mean number of step per day is `r mean.steps` steps per day.
The median is `r median.steps` steps per day.
## What is the average daily activity pattern?
```{r}
library(ggplot2)
library(scales)
mean.steps <- mean(activity$steps, na.rm = TRUE)
p <- qplot(activity$fulldate, activity$steps, geom="line") +scale_x_datetime(breaks = date_breaks("10 days"))
p <- p + geom_abline(intercept = mean.steps, slope = 0, colour ="red")
print(p)
```
```{r}
max.step = filter(activity, steps == max(activity$steps, na.rm = TRUE))
```
The maximum number of steps are in the five minutes interval at date `r max.step$date` interval `r max.step$interval`
The number of steps is `r max.step$steps`
## Imputing missing values
Missing values are imputed by setting that interval with the mean value of that interval over all days.
```{r}
# Get the missing value count
activity <- mutate(activity, missing = is.na(activity$steps) )
missing.values.count <- sum(activity$missing)
# Calculate the mean of each interval
intervaltable <- activity %>%
group_by(interval) %>%
summarise_each(funs(mean(., na.rm = TRUE)), steps)
# Imputed the values into the activity data
getImputedValue <- function(s) { filter(intervaltable, interval=s)$steps }
activity.imputed <- mutate(activity, steps = ifelse(missing, getImputedValue(interval) ,steps))
```
The number of missing values is `r missing.values.count`
```{r results="asis"}
number.of.steps.by.day.imputed <- activity.imputed %>%
group_by(date) %>%
summarise_each(funs(sum(., na.rm = TRUE)), steps)
```
In the historgram the number of steps per day with imputed data are show
```{r}
barplot( number.of.steps.by.day.imputed$steps, names.arg=number.of.steps.by.day.imputed$date, ylab="Number of steps per day")
```
```{r}
mean.steps.imputed <- mean(number.of.steps.by.day.imputed$steps)
median.steps.imputed <- median(number.of.steps.by.day.imputed$steps)
```
The mean number of step per day with imputed data is `r mean.steps.imputed` steps per day.
The median with imoputed data is`r median.steps.imputed` steps per day.
The mean number of step per day without imputed data is `r mean.steps` steps per day
and the median without imputed data is `r median.steps` steps per day.
So the impackt on the result is big.
## Are there differences in activity patterns between weekdays and weekends?
```{r}
# library(lattice)
activity.imputed <- activity.imputed %>%
mutate(daytype = weekdays(fulldate) %in% c("Saturday", "Sunday")) %>%
mutate(daytype = factor(daytype ,labels= c("weekday", "weekend")))
#
activity.imputed.weekdays.mean <- activity.imputed %>%
filter(daytype == "weekday") %>%
group_by(interval) %>%
summarise_each(funs(mean(.)), steps) %>%
mutate(daytype = "weekday")
#
activity.imputed.weekend.mean <- activity.imputed %>%
filter(daytype == "weekend") %>%
group_by(interval) %>%
summarise_each(funs(mean(.)), steps) %>%
mutate(daytype = "weekend")
par(mfrow = c(2,1))
plot( activity.imputed.weekdays.mean$interval, activity.imputed.weekdays.mean$steps, type="l", xlab ="weekdays", ylab="Number of steps.")
plot( activity.imputed.weekend.mean$interval, activity.imputed.weekend.mean$steps, type="l", xlab ="weekend", ylab="Number of steps.")
```