Coursera "Getting and Cleaning Data" - Course Project

Course project deliverables for the Coursera course Getting and Cleaning Data

Instructions

Source the script run_analysis.R.
When sourced, the script checks if the required R packages are available and proceeds to install them if they are not found
Calling download.data() downloads the zipped dataset and unarchives it.
Calling run.analysis() starts the actual data processing, which are as follows:
1. Feature vector label data is loaded from features.txt
2. Using regex with grepl, subset of label data for selecting desired data columns is created.
3. Activity labels are loaded from features.txt
4. Activity labels (id->label) and selected features (id->label) are given as parameters to function which loads the training or test dataset, based on the type value given also as a parameter.
  1. Paths to data files are created based on type parameter
  2. Data files are loaded. Feature vector data is filtered using ids of the selected features.
  3. Activity and subject id data are loaded
  4. Feature vector data is renamed using the names of selected features
  5. Activies and subjects are given labels using factor levels of activity and subject id data.
  6. Finally, processed dataset is returned.
5. The previous processing is repeated to both training and test datasets.
6. Training and test datasets are merged using rbind() and converted to data.table to make it easier to do group-wise operations in the following step
7. A new independent tidy dataset is created by calculating means for all variables for each activity and subject.
8. Variable names are loaded to separate vector and modified to follow CamelCase convention.
9. New names are applied to tidy dataset.
10. Both raw and tidy datasets are written to disk.
11. Tidy datset is returned as output of the function.

In case the Samsung data is already unzipped and directory with the dataset is available as UCI HAR Dataset subdirectory of the current directory, the processing function run.analysis() can be called straight away, no need to call download.data().

Cleaned dataset

At the end processing, both raw and tidy datasets are written to disk into raw-dataset.txt and tidy-dataset.txt respectively under the current working directory.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
CodeBook.md		CodeBook.md
README.md		README.md
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coursera "Getting and Cleaning Data" - Course Project

Instructions

Cleaned dataset

About

Releases

Packages

Languages

arun6174/coursera-jhu-gncd-project

Folders and files

Latest commit

History

Repository files navigation

Coursera "Getting and Cleaning Data" - Course Project

Instructions

Cleaned dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages