Skip to content

Latest commit

 

History

History
102 lines (91 loc) · 9.63 KB

CodeBook.md

File metadata and controls

102 lines (91 loc) · 9.63 KB

CodeBook - Getting and Cleaning Data Assignment

Author: Brian von Konsky
Created: April 2014
Course: Getting and Cleaning Data
Repository: https://github.com/bvonkonsky/GettingAndCleaningData

The original data for this data cleaning assignment were measured in a study conducted by Jorge L. Reyes-Ortiz, Alessandro Ghio, Luca Oneto, and Davide Anguita. More information about the data and the study that produced it is available on the UCI Machine Learning Repository.

Papers arising from the original or cleaned data should reference:

Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine. International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec 2012.

The original data were downloaded and subsequently cleaned using an R script called run_analysis.R.

The cleaned version merges the training and test data from the original study, and uses arguably more meaningful variable names. The mapping from original to modified variable names is shown in the table below. Occurrences of Mag were changed to Magnitude, Acc to Accelerate, std to StadardDeviation and mean to Mean. All parentheses, dots and hyphens were removed. The t prefixes for time and f prefixes for frequency domain at the beginning of variable names were retained to avoid making names more unwieldy. A small modification to the script could easily expand this if desired.

The cleaned version of the data only retains original variables that include mean or std. Variables that did not contain mean or std were intentionally dropped from the cleaned data set.

The cleaned version of the data follows the four principles of a Tidy Dataset as described by Jeff Leek and the Leek Group in their document on datasharing. Specifically:

  • each variable is in its own column;
  • each observation in the case of tidyMerged.csv or mean of aggregated observations in the case of tidyAveraged.csv is contained in a single row, labeled by subject ID and the activity;
  • one table for each kind of measurement, in this case measurements taken using Samsung mobile devices recorded while subjects were engaged in various activities; and
  • multiple tables/csv files are readily linked, in this case, by subject ID and activity.

The run_analysis.R script produces cleaned and tidy data in two Comma Separated Value (CSV) files. The first CSV file contains merged training and test data. The second CSV file contains the average of all observations for a given subject and activity.

All units in the cleaned data are the same as those used in the original data.

See Also: README.md

Column Variable Definition
1 SubjectID Unique integer referring to individual study participants. Ranges between 1 and 30.
2 Activity Factor
3 tBodyAccelerationMeanX Pre-cleaning variable name: tBodyAcc-mean()-X
4 tBodyAccelerationMeanY Pre-cleaning variable name: tBodyAcc-mean()-Y
5 tBodyAccelerationMeanZ Pre-cleaning variable name: tBodyAcc-mean()-Z
6 tBodyAccelerationStandardDeviationX Pre-cleaning variable name: tBodyAcc-std()-X
7 tBodyAccelerationStandardDeviationY Pre-cleaning variable name: tBodyAcc-std()-Y
8 tBodyAccelerationStandardDeviationZ Pre-cleaning variable name: tBodyAcc-std()-Z
9 tGravityAccelerationMeanX Pre-cleaning variable name: tGravityAcc-mean()-X
10 tGravityAccelerationMeanY Pre-cleaning variable name: tGravityAcc-mean()-Y
11 tGravityAccelerationMeanZ Pre-cleaning variable name: tGravityAcc-mean()-Z
12 tGravityAccelerationStandardDeviationX Pre-cleaning variable name: tGravityAcc-std()-X
13 tGravityAccelerationStandardDeviationY Pre-cleaning variable name: tGravityAcc-std()-Y
14 tGravityAccelerationStandardDeviationZ Pre-cleaning variable name: tGravityAcc-std()-Z
15 tBodyAccelerationJerkMeanX Pre-cleaning variable name: tBodyAccJerk-mean()-X
16 tBodyAccelerationJerkMeanY Pre-cleaning variable name: tBodyAccJerk-mean()-Y
17 tBodyAccelerationJerkMeanZ Pre-cleaning variable name: tBodyAccJerk-mean()-Z
18 tBodyAccelerationJerkStandardDeviationX Pre-cleaning variable name: tBodyAccJerk-std()-X
19 tBodyAccelerationJerkStandardDeviationY Pre-cleaning variable name: tBodyAccJerk-std()-Y
20 tBodyAccelerationJerkStandardDeviationZ Pre-cleaning variable name: tBodyAccJerk-std()-Z
21 tBodyGyroMeanX Pre-cleaning variable name: tBodyGyro-mean()-X
22 tBodyGyroMeanY Pre-cleaning variable name: tBodyGyro-mean()-Y
23 tBodyGyroMeanZ Pre-cleaning variable name: tBodyGyro-mean()-Z
24 tBodyGyroStandardDeviationX Pre-cleaning variable name: tBodyGyro-std()-X
25 tBodyGyroStandardDeviationY Pre-cleaning variable name: tBodyGyro-std()-Y
26 tBodyGyroStandardDeviationZ Pre-cleaning variable name: tBodyGyro-std()-Z
27 tBodyGyroJerkMeanX Pre-cleaning variable name: tBodyGyroJerk-mean()-X
28 tBodyGyroJerkMeanY Pre-cleaning variable name: tBodyGyroJerk-mean()-Y
29 tBodyGyroJerkMeanZ Pre-cleaning variable name: tBodyGyroJerk-mean()-Z
30 tBodyGyroJerkStandardDeviationX Pre-cleaning variable name: tBodyGyroJerk-std()-X
31 tBodyGyroJerkStandardDeviationY Pre-cleaning variable name: tBodyGyroJerk-std()-Y
32 tBodyGyroJerkStandardDeviationZ Pre-cleaning variable name: tBodyGyroJerk-std()-Z
33 tBodyAccelerationMagnitudeMean Pre-cleaning variable name: tBodyAccMag-mean()
34 tBodyAccelerationMagnitudeStandardDeviation Pre-cleaning variable name: tBodyAccMag-std()
35 tGravityAccelerationMagnitudeMean Pre-cleaning variable name: tGravityAccMag-mean()
36 tGravityAccelerationMagnitudeStandardDeviation Pre-cleaning variable name: tGravityAccMag-std()
37 tBodyAccelerationJerkMagnitudeMean Pre-cleaning variable name: tBodyAccJerkMag-mean()
38 tBodyAccelerationJerkMagnitudeStandardDeviation Pre-cleaning variable name: tBodyAccJerkMag-std()
39 tBodyGyroMagnitudeMean Pre-cleaning variable name: tBodyGyroMag-mean()
40 tBodyGyroMagnitudeStandardDeviation Pre-cleaning variable name: tBodyGyroMag-std()
41 tBodyGyroJerkMagnitudeMean Pre-cleaning variable name: tBodyGyroJerkMag-mean()
42 tBodyGyroJerkMagnitudeStandardDeviation Pre-cleaning variable name: tBodyGyroJerkMag-std()
43 fBodyAccelerationMeanX Pre-cleaning variable name: fBodyAcc-mean()-X
44 fBodyAccelerationMeanY Pre-cleaning variable name: fBodyAcc-mean()-Y
45 fBodyAccelerationMeanZ Pre-cleaning variable name: fBodyAcc-mean()-Z
46 fBodyAccelerationStandardDeviationX Pre-cleaning variable name: fBodyAcc-std()-X
47 fBodyAccelerationStandardDeviationY Pre-cleaning variable name: fBodyAcc-std()-Y
48 fBodyAccelerationStandardDeviationZ Pre-cleaning variable name: fBodyAcc-std()-Z
49 fBodyAccelerationJerkMeanX Pre-cleaning variable name: fBodyAccJerk-mean()-X
50 fBodyAccelerationJerkMeanY Pre-cleaning variable name: fBodyAccJerk-mean()-Y
51 fBodyAccelerationJerkMeanZ Pre-cleaning variable name: fBodyAccJerk-mean()-Z
52 fBodyAccelerationJerkStandardDeviationX Pre-cleaning variable name: fBodyAccJerk-std()-X
53 fBodyAccelerationJerkStandardDeviationY Pre-cleaning variable name: fBodyAccJerk-std()-Y
54 fBodyAccelerationJerkStandardDeviationZ Pre-cleaning variable name: fBodyAccJerk-std()-Z
55 fBodyGyroMeanX Pre-cleaning variable name: fBodyGyro-mean()-X
56 fBodyGyroMeanY Pre-cleaning variable name: fBodyGyro-mean()-Y
57 fBodyGyroMeanZ Pre-cleaning variable name: fBodyGyro-mean()-Z
58 fBodyGyroStandardDeviationX Pre-cleaning variable name: fBodyGyro-std()-X
59 fBodyGyroStandardDeviationY Pre-cleaning variable name: fBodyGyro-std()-Y
60 fBodyGyroStandardDeviationZ Pre-cleaning variable name: fBodyGyro-std()-Z
61 fBodyAccelerationMagnitudeMean Pre-cleaning variable name: fBodyAccMag-mean()
62 fBodyAccelerationMagnitudeStandardDeviation Pre-cleaning variable name: fBodyAccMag-std()
63 fBodyBodyAccelerationJerkMagnitudeMean Pre-cleaning variable name: fBodyBodyAccJerkMag-mean()
64 fBodyBodyAccelerationJerkMagnitudeStandardDeviation Pre-cleaning variable name: fBodyBodyAccJerkMag-std()
65 fBodyBodyGyroMagnitudeMean Pre-cleaning variable name: fBodyBodyGyroMag-mean()
66 fBodyBodyGyroMagnitudeStandardDeviation Pre-cleaning variable name: fBodyBodyGyroMag-std()
67 fBodyBodyGyroJerkMagnitudeMean Pre-cleaning variable name: fBodyBodyGyroJerkMag-mean()
68 fBodyBodyGyroJerkMagnitudeStandardDeviation Pre-cleaning variable name: fBodyBodyGyroJerkMag-std()