Introduction

This repository contains the implementation of the course project for the MOOC Getting and Cleaning Data on Coursera. The purpose is to create a tidy dataset from another (less tidy) dataset.

The dataset is used for Human Activity Recognition Using Smartphones. It contains data on experiments where people moves where recorded with the sensor signal (accelerometer, gyroscope) of a Samsung Galaxy S.

The dataset is split across several files:

File	Content
activity_labels.txt	Contains the activities labels (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING)
features_info.txt	Feature description, how they are calculated...
features.txt	List of all features. It will be used to extract column names

Then there are two dataset (train and test), and for each:

File	Content
subject_{train,test}.txt	List of subject by observation
X_{train,test}.txt	The observation with as many features as described in the features.txt file
y_{train,test}.txt	The observed activities

subject_{train,test}.txt, X_{train,test}.txt and y_{train,test}.txt have the same length.

Extracted values

Variables are described in the CodeBook. The main objective was to extract variables that match only the mean or std, so we can focus on means and standard deviation. Then variables were slightly renamed to remove parenthesis and replace '-' by '.'.

At last, the tidy data set contains means for each of those extracted variables for each activity then by each subject.

Generate the tidy dataset

Clone this repository, then open the script with R:

    source('run_analysis.R')

You might need to install the following packages first: data.table, plyr.

The data is downloaded from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip, and unzip'ed in the data/ directory.

The run_analysis writes the result in a file Meandata.txt in the current working directory. Format of this file is described in the CodeBook file.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
test		test
train		train
.Rhistory		.Rhistory
.gitattributes		.gitattributes
.gitignore		.gitignore
CodeBook.MD		CodeBook.MD
Meandata.txt		Meandata.txt
README.MD		README.MD
README.md.txt		README.md.txt
README_orig.txt		README_orig.txt
activity_labels.txt		activity_labels.txt
features.txt		features.txt
features_info.txt		features_info.txt
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Extracted values

Generate the tidy dataset

About

Releases

Packages

Languages

driscoll42/GettingAndCleaningData

Folders and files

Latest commit

History

Repository files navigation

Introduction

Extracted values

Generate the tidy dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages