GitHub - pasichnyi/CleaningWearableDataProject: This repo is for the Course Project in the "Getting and Cleaning Data" course

Summary

This repo is for the Course Project in the "Getting and Cleaning Data" course. It includes 3 files with tidy data (full dataset, subset of means and stds, summary with averages by activities and subjects), the R script for obtaining those from the initial dataset, CodeBook and this README.

Files

File name	Description
README.md	This file
CodeBook.md	Description of the datasets and its variables
run_analysis.R	Script to produce tidy data sets
fulldata.txt	Full data set obtained by merging train and test data and coupling them with activities and subjects
means_and_stds.txt	Subset of the full data - only means and standard deviations are included
averages.txt	Summary for the means and stds - average is counted for each Activity and Subject

Instructions

Download files from https://github.com/pasichnyi/CleaningWearableDataProject
Download the data set from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
Unzip the data set to the same folder
Load RStudio and set your working directory to the same folder as the downloaded files
Run the R script

source('run_analysis.R')
run_analysis()

6.The tidy datasets (fulldata.txt, means_and_stds.txt and averages.txt) will be saved to the working folder.

Notes on the script

Run_analysis.R aims to perform 5 steps assigned in the task:

Merges the training and the test sets to create one data set.
Extracts only the measurements on the mean and standard deviation for each measurement.
Uses descriptive activity names to name the activities in the data set.
Appropriately labels the data set with descriptive variable names.
From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

It's implemented in two functions. The main one - run_analysis(url) has default parameter directory with the name of the directory with the raw dataset and makes call to the create_merged_dataset(url).

Steps 1,3,4 are completed in the create_merged_dataset function, while steps 2 and 5 are made directly in the main function run_analysis.

All produced files are put to the current working directory.

Cleaning workflow description

Main function run_analysis launches the create_merged_dataset function. Raw data is looked for at the default directory="UCI HAR Dataset" if no user parameter is passed to the main function.
Function subsequently reads in labels for activities and features from the top extracted directory.
Function reads in subjects, activities and signals from the train and test directories.
Created dataframes are merged into one single merged_df dataframe, with which steps 5-9 are done.
Activity column is set up - values are set according to the read labels, case is lowered down.
Subject column is factorised to have a proper data type for the column.
The features columns are set up:

initial labels are created;
extra dots are removed;
extra "Body" words are removed in case of "BodyBody" in the variable names.

Dataset is ordered by Type, Activity and Subject columns.
Resulting dataset is returned to the run_analysis function as a dataset dataframe.
Means and stds are subsetted with regular expression, that looks for the "mean" and "std" but eliminates frequencies for those. Result is saved to the means_and_stds dataframe.
Summary with averages by Activity and Subjects is created to the averages dataframe.
Obtained datasets are saved to the resulting txt files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summary

Files

Instructions

Notes on the script

Cleaning workflow description

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
CodeBook.md		CodeBook.md
README.md		README.md
averages.txt		averages.txt
fulldata.txt		fulldata.txt
means_and_stds.txt		means_and_stds.txt
run_analysis.R		run_analysis.R

pasichnyi/CleaningWearableDataProject

Folders and files

Latest commit

History

Repository files navigation

Summary

Files

Instructions

Notes on the script

Cleaning workflow description

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages