Skip to content

Human Activity Recognition project for the Getting and Cleaning Data coursera course

Notifications You must be signed in to change notification settings

awcc/GCD-HAR-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GCD-HAR-Project

Human Activity Recognition project for the Getting and Cleaning Data coursera course

This project consists of an R script run_analysis.R, this readme file README.md, a code book CodeBook.md describing the variables, data, and the process used in the R script, and an output file har_averages.txt

See the file CodeBook.md for an extensive description of the output format.

Summary

This project takes data from the Human Activity Recognition Using Smartphones dataset at http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones and produces an output table of averaged means and standard deviations for measurements for each pair of 30 subjects and 6 activities.

The R script run_averages.R (described below) produces the output file har_averages.txt (described in CodeBook.md).

R script run_analysis.R

The file run_analysis.R reads in data from the UCI Human Activity Recognition (HAR) Dataset.

Training and test data are read in, selecting only variables corresponding to the mean or standard deviation of a measurement. Proper variable names are read in from the features.txt file and cleaned up, and activity names are read in and used to convert the activity numbers to a well-named factor. The data are then merged into one data frame using rbind and cbind.

The data frame har_data is the large, unaveraged data frame requested in item 4 of the specification. It contains one observation per row, with the subject and activity in the first two columns and each measurement's mean and standard deviation as a variable.

I then melt and cast (using dcast) this data frame using Hadley Wickham's reshape2 package to produce a data frame har_ave and an output file har_averages.txt containing averages for each feature, as requested in item 5 of the specification. These meet the tidy data specification:

I have chosen the "wide" style of tidy data because each of the many variables is in fact all part of the same observation. See Hadley Wickham's paper on tidy data: http://vita.had.co.nz/papers/tidy-data.pdf especially Table 12, as well as the course discussion thread discussing wide and narrow tidy data: https://class.coursera.org/getdata-008/forum/thread?thread_id=94

About

Human Activity Recognition project for the Getting and Cleaning Data coursera course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages