##NOTE: The UCI HAR dataset files have to be in the working directory ##The script is base on the data.table package available from CRAN. This package is very fast and memory efficient when dealing with huge data.frames.
-
Read UCI data and convert it into data.tables.
- train.x and test.x are the DTs with the measurement variables (561 measurements);
- "train.act.labels" and "test.act.labels" are the activity labels;
- "train.sub" and "test.sub" are the data.tables of subjects
- "features" is the data.table with the features
- "activity" is the data.table of activity names
-
Join test.x and train.x datasets together and assign new data.table
- "new.dt" - a joined data.table with test and train data;
- "act.all" - a data.table with joined activity labels and keyed by column "Code". A key is the column in data.table by which we can join another data.table
- "subj.all" is joined subjects from test and train datasets
-
setnames() sets names to a data.table and "new.dt" gets column names from "features"
-
Selecting columns conditional on "mean()" and "std():
- "means.id" - columns that have "mean()" in their names;
- "std.id" - columns that have "std()" in the names;
- "sel.df" - a new data.table subset on "mean.id" and "std.id"
-
Match activity to the referenced activity data.table:
- "act.labeled" - labeled activities for all subjects;
-
Join data of variables, subjects, and activities
-
Calculate mean of all variables for each subject and each activity
-
Convert data.table to data.frame
-
Write as .csv write.csv(tidy.df,file='tidy.df.csv')