forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
119 lines (94 loc) · 3.63 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# Reproducible Research: Peer Assessment 1
## Loading and preprocessing the data
We can start to open the csv file, convert it to a data.frame and preprocess the data in order to have a tidy and useful dataset.
```{r}
#open the dataset
table <- read.csv("./activity.csv")
#convertions
table$date <- as.Date(table$date)
table$steps <- as.numeric(as.character(table$steps))
table$interval <- as.numeric(as.character(table$interval))
head(table)
#preprocessing
steps_date <- aggregate(steps ~ date, data=table, FUN=sum)
```
We now have a tidy and simple dataset to analize:
```{r}
str(steps_date)
head(steps_date)
```
## What is mean total number of steps taken per day?
We can now create a histogram that plot the steps per day.
```{r}
barplot(steps_date$steps, names.arg = steps_date$date, xlab = "Date", ylab = "Steps")
```
We can proceed calculating the mean of the steps:
```{r}
steps_mean <- mean(steps_date$steps)
steps_mean
```
And the median of the steps:
```{r}
steps_median <- median(steps_date$steps)
steps_median
```
## What is the average daily table pattern?
Now let's make a time series plot (i.e. type = "l") of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)
```{r}
steps_data <- aggregate(steps ~ interval, data = table, FUN = mean)
plot(steps_data, type = "l", xlab = "Interval", ylab = "Steps")
```
Which 5-minute interval, on average across all the days in the dataset,
contains the maximum number of steps?
```{r}
steps_data$interval[which.max(steps_data$steps)]
```
## Imputing missing values
Note that there are a number of days/intervals where there are missing values (coded as NA). The presence of missing days may introduce bias into some calculations or summaries of the data.
Calculate and report the total number of missing values in the dataset
```{r}
sum(is.na(table))
```
Creation of a dataset with the *mean* filled instead of the missing values
```{r}
table <- merge(table, steps_data, by = "interval", suffixes = c("",".y"))
na <- is.na(table$steps)
table$steps[na] <- table$steps.y[na]
table <- table[,c(1:3)]
```
Let's make a histogram of the total number of steps taken each day and Calculate
and report the **mean** and **median** total number of steps taken per day.
```{r}
steps_date <- aggregate(steps ~ date, data=table, FUN=sum)
barplot(steps_date$steps, names.arg=steps_date$date, xlab="Date", ylab="Steps")
mean(steps_date$steps)
median(steps_date$steps)
```
We can see that median and mean are now corresponding instead the analysis with *Na values*
## Are there differences in table patterns between weekdays and weekends?
Create a new factor variable in the dataset with two levels -- "weekday" and
"weekend" indicating whether a given date is a weekday or weekend day.
```{r, cache=TRUE}
Sys.setlocale("LC_TIME", "en_US")
dayOfWeek <- function(date) {
if (weekdays(as.Date(date)) %in% c("Saturday", "Sunday")) {
"weekend"
} else {
"weekday"
}
}
table$dayOfWeek <- as.factor(sapply(table$date, dayOfWeek))
head(table)
```
Make a panel plot containing a time series plot of the 5-minute interval (x-axis)
and the average number of steps taken, averaged across all weekday days or
weekend days (y-axis).
```{r}
par(mfrow=c(2,1))
steps_type <- aggregate(steps ~ interval, data = table,
subset = table$dayOfWeek == "weekend", FUN = mean)
plot(steps_type, main = "weekend", type = "l", xlab = "Interval", ylab = "Steps")
steps_type <- aggregate(steps ~ interval, data = table,
subset = table$dayOfWeek == "weekday", FUN = mean)
plot(steps_type, main = "weekday", type = "l", xlab = "Interval", ylab = "Steps")
```