This repository was archived by the owner on Sep 12, 2024. It is now read-only.
forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
107 lines (90 loc) · 3.45 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
title: "Reproducible Research: Peer Assessment 1"
output:
html_document:
keep_md: true
---
```{r echo=FALSE}
knitr::opts_chunk$set(message = FALSE, warning = FALSE, results = "asis")
```
## Loading libraries
```{r load library}
library(dplyr)
library(timeDate)
library(ggplot2)
```
## Loading and preprocessing the data
```{r read file}
zipfile <- "activity.zip"
filename <- "activity.csv"
if(!file.exists(filename)) {
unzip(zipfile)
}
activity <- read.csv(filename, header = TRUE)
```
## What is mean total number of steps taken per day?
1. Plotting total no. of steps taken per day
```{r number_of_step}
stepsPerDay <- tapply(activity$steps, activity$date, sum, na.rm=TRUE)
qplot(stepsPerDay, main = "Total no. of steps taken per day", xlab = "No. of steps", ylab= "Frequency", binwidth = 500)
```
2. Mean and median total number of steps per day
```{r mean and meadian step}
meanStep <- mean(stepsPerDay)
medianStep <- median(stepsPerDay)
```
* Median total number of steps taken per day = __`r medianStep`__
* Mean total number of steps taken per day = __`r meanStep`__
## What is the average daily activity pattern?
1. Finding out Average Steps per Day
```{r average daily activity time series plot}
by_interval <- activity %>% filter(!is.na(steps)) %>% group_by(interval)
avgStepsPerDay <- summarise(by_interval, meanSteps = mean(steps))
```
2. Plotting Average Steps per day vs interval
```{r plot_time_series}
g <- ggplot(avgStepsPerDay, aes(y = meanSteps, x = interval))
g + geom_line() + labs(title = "Average daily activity pattern", xlab= "Interval", ylab = "Average no. of steps taken")
```
3. Finding out which interval has maximum number of steps
```{r max steps}
maxStep <- avgStepsPerDay %>% filter(meanSteps == max(meanSteps))
maxStepInterval <- maxStep[,1]
```
* The interval which contains maximum number of steps is __`r maxStepInterval`__
## Imputing missing values
1. No of rows with NA values
```{r missing data}
missingData <- sum(!complete.cases(activity))
```
* Number of rows with NA is __`r missingData`__
2. Filling in the NA values and imputing into new dataset
```{r fill NA}
activityNew <- activity %>% group_by(interval) %>%
mutate(steps = replace(steps, is.na(steps),
mean(steps, na.rm = TRUE)))
```
3. Histogram of the new dataset
```{r no_of_steps_new}
stepsPerDay <- tapply(activityNew$steps, activityNew$date, sum)
qplot(stepsPerDay, main = "Total no. of steps taken per day", xlab = "No. of steps", ylab= "Frequency", binwidth = 500)
```
4. Mean and median total number of steps per day
```{r mean and meadian step new}
meanStep <- mean(stepsPerDay)
medianStep <- median(stepsPerDay)
```
* Median total number of steps taken per day = __`r medianStep`__
* Mean total number of steps taken per day = __`r meanStep`__
## Are there differences in activity patterns between weekdays and weekends?
1. Creating a new factor variable
```{r weekend_and_weekdays}
activityNew$dateType <- ifelse(as.POSIXlt(activityNew$date)$wday %in% c(0,6),
'weekend','weekday')
```
2. Make a panel plot using time series
```{r time_series_dateType}
by_interval <- activityNew %>% group_by(interval, dateType)
avgStepsPerDay <- summarise(by_interval, meanSteps = mean(steps))
ggplot(avgStepsPerDay, aes(interval, meanSteps)) + geom_line() + facet_grid( dateType ~ .) + labs(title = "Average daily activity pattern by dateType", xlab= "Interval", ylab = "Average no. of steps taken")
```