This repository has been archived by the owner on Dec 23, 2020. It is now read-only.
forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
report.Rmd
139 lines (117 loc) · 3.87 KB
/
report.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# Reproducible Research: Peer Assessment 1
Before starting the analysis set up all global options for the code chunks
```{r}
opts_chunk$set(fig.width=10, fig.height=6)
```
## Loading and preprocessing the data
```{r}
data <- read.csv('activity.csv')
```
## What is mean total number of steps taken per day?
Group steps by date for which they were recorded:
```{r}
per_day <- aggregate(steps ~ date, data, sum)
barplot(
per_day$steps,
main="Number of steps per day",
xlab="Day",
ylab="Total Steps Per Day",
names.arg=per_day$date,
col="blue",
)
```
To gain some extra perspective into the way one should understand the above we
can take a look at the histogram of steps:
```{r}
hist(
subset(data, steps > 0)$steps,
breaks=25,
xlab='Number of Steps',
main="Distribution of Number of Steps")
```
In order to understand total number of steps a little bit more one could
calculate the mean and median of the series:
```{r}
mean_steps <- mean(per_day$steps)
median_steps <- median(per_day$steps)
```
Giving the following results:
- **Mean**: `r mean_steps`
- **Median**: `r median_steps`
## What is the average daily activity pattern?
Let's look at the line plot showing daily activity across all dates:
```{r}
per_interval <- aggregate(steps ~ interval, data, mean)
x <- per_interval$interval
y <- per_interval$steps
plot(
x,
y,
type="l",
xlab="Interval",
ylab="Average Steps",
main="Averaged Number of Steps per Interval")
```
it's clearly visible that the maximum appears at:
```{r}
subset(per_interval, steps==max(y))$interval
```
## Imputing missing values
```{r}
number_of_nas <- sum(is.na(data$steps))
```
Total number of NAs is `r number_of_nas`.
In will remove NAs that appear in the **steps** column by replacing them with the
mean number of steps for a date.
```{r}
library(zoo)
new_data <- data
new_data$steps <- na.aggregate(data$steps, by='date', FUN=mean)
```
Let's see whether our strategy for filling up the NAs has any serious impact on the
data. First let's have a look at the distribution of total number of steps for each day
```{r}
new_per_day <- aggregate(steps ~ date, new_data, sum)
barplot(
new_per_day$steps,
main="Number of steps per day",
xlab="Day",
ylab="Total Steps Per Day",
names.arg=new_per_day$date,
col="blue",
)
```
Also as before let's calculate mean and median for the new data set:
```{r}
new_mean_steps <- mean(new_per_day$steps)
new_median_steps <- median(new_per_day$steps)
```
Giving the following results:
- **Mean**: `r new_mean_steps`
- **Median**: `r new_median_steps`
One can clearly see that the difference is minimal:
- **Mean Diff**: `r new_mean_steps - mean_steps`
- **Median Diff**: `r new_median_steps - median_steps`
## Are there differences in activity patterns between weekdays and weekends?
Before answering the question of whether there's any difference bettwen weekend
and weekday behaviour one needs to create new factor variable:
```{r}
new_data$day_type <- weekdays(as.POSIXlt(new_data$date, format="%Y-%m-%d"))
weekend <- c("Saturday", "Sunday")
weekdays <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
new_data[new_data$day_type %in% weekend,]$day_type <- "weekend"
new_data[new_data$day_type %in% weekdays,]$day_type <- "weekday"
new_data$day_type <- factor(new_data$day_type)
```
Finally let's have a look at the difference between weekend and weekday behaviour:
```{r}
library(lattice)
new_per_interval_weekend <- aggregate(
steps ~ interval, subset(new_data, day_type == 'weekend'), mean)
new_per_interval_weekend$day_type <- 'weekend'
new_per_interval_weekday <- aggregate(
steps ~ interval, subset(new_data, day_type == 'weekday'), mean)
new_per_interval_weekday$day_type <- 'weekday'
day_type_data <- rbind(new_per_interval_weekend, new_per_interval_weekday)
xyplot(steps ~ interval | day_type, data=day_type_data, layout = c(1, 2), type="l")
```