GitHub - SteveWang1992/Getting_Cleaning_Data_Course_Project: 4 required files included in this repo

Introduction

This code project repository include fllowing four different files:

README.md
tidy_dataset.txt
run_analysis.R
Codebook.md

The purpose of this project is getting and cleaning the dataset acuquired from UCI HAR Dataset. The run_analysis.R is the script for creating a tidy summary dataset from separated datasets in the original UCI HAR Dataset folder. As a result, the tidy_dataset.txt is tidy dataset created by run_analysis.R. Codebook.md file is the companion codebook for explaining each variable in tidy_dataset.txt.

The R version used in this project is shown as following:

platform       x86_64-apple-darwin13.4.0   
arch           x86_64                      
os             darwin13.4.0                
system         x86_64, darwin13.4.0        
status                                     
major          3                           
minor          6.1                         
year           2019                        
month          07                          
day            05                          
svn rev        76782                       
language       R                           
version.string R version 3.6.1 (2019-07-05)
nickname       Action of the Toes

Because the run_analysis.R relys on four different packages from tidyverse, so the package version of four packages added in the run_analysis.R are listed as below:

readr: 1.3.1
tidyr: 0.8.3
dplyr: 0.8.0.1
stringr: 1.4.0

README.md

README.md file include the explanations of all files inside the project repository, including the self explanation for README.md itself. README.md includes the introduction of this project, present the purpose of each file existance.

tidy_dataset.txt

This is the dataset created by the run_analysis.R script, using the run_analysis.R script. The dataset include summrize the mean value of the extracted variable from the original dataset, group by acivity and subjects.

run_analysis.R

The script build up run_analysis function, and the function will take nine parameters listed as below:

feature_path: the file path of feature.txt
train_x_path: the file path of X_train.txt
test_x_path: the file path of X_test.txt
train_y_path: the file path of y_train.txt
test_y_path: the file path of y_test.txt
activity_labels_path: the file path of activity_labels.txt
subject_train_path: the file path of subject_train.txt
subject_test_path: the file path of subject_test.txt
output_path: the file path you would like to store the tidy dataset created by the run_analysis

The structure of run_analysis function code mainly have several different parts as following:

Part 1: Using read_delim() from readr package Loading feature file, and then cleaning, tidying the feature dataframe. Extracting the specific features info from the features dataframe.
Part 2: Using read_delim() from readr package Load the X_train.txt and X_test.txt, and merge the two imported data files as one dataframe, and extract the specific columns from it. Use the extracted specific feature info we get from part 1 name the merged dataframe.
Part 3: Using read_table() from readr package Load the y_train.txt and y_test.txt, and merge the two imported data files into one dataframe. And then load the activity_labels.txt file into the dataframe, using activity labels data to transform the activity data from double into factor.
Part 4: Using read_table() from readr package Load the subject_train.txt and subject_test.txt, and merge them into one dataframe. For the use of calculating the mean value of each activity and subject.
Part 5: Using summarise_all() from dplyr and gather() from tidyr Use summarise_all() calculating the mean value of each extracted variable for each activity and subject. And then use gather() to make the summarized dataframe tidy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

README.md

tidy_dataset.txt

run_analysis.R

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Codebook.md		Codebook.md
README.md		README.md
run_analysis.R		run_analysis.R
tidy_dataset.txt		tidy_dataset.txt

SteveWang1992/Getting_Cleaning_Data_Course_Project

Folders and files

Latest commit

History

Repository files navigation

Introduction

README.md

tidy_dataset.txt

run_analysis.R

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages