forked from johnmyleswhite/ML_for_Hackers
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added and edited all script headings. Chapters 6, 7, 8, 10, and 12 st…
…ill require descriptions.
- Loading branch information
1 parent
d70ae86
commit 6e4cba4
Showing
14 changed files
with
135 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,11 @@ | ||
# File-Name: package_installer.R | ||
# Date: 2011-11-01 | ||
# Date: 2012-02-10 | ||
# Author: Drew Conway ([email protected]) and John Myles White ([email protected]) | ||
# Purpose: Install all of the packages needed for the Machine Learning for Hackers case studies | ||
# Data Used: n/a | ||
# Packages Used: n/a | ||
|
||
# All source code is copyright (c) 2011, under the Simplified BSD License. | ||
# All source code is copyright (c) 2012, under the Simplified BSD License. | ||
# For more information on FreeBSD see: http://www.opensource.org/licenses/bsd-license.php | ||
|
||
# All images and materials produced by this code are licensed under the Creative Commons | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,6 @@ | ||
# File-Name: ml_basics.R | ||
# Date: 2011-11-01 | ||
# Author: Drew Conway | ||
# Email: [email protected] | ||
# File-Name: ufo_sightings.R | ||
# Date: 2012-02-10 | ||
# Author: Drew Conway ([email protected]) and John Myles White ([email protected]) | ||
# Purpose: Code for Chapter 1. In this case we will review some of the basic | ||
# R functions and coding paradigms we will use throughout this book. | ||
# This includes loading, viewing, and cleaning raw data; as well as | ||
|
@@ -11,7 +10,7 @@ | |
# Data Used: http://www.infochimps.com/datasets/60000-documented-ufo-sightings-with-text-descriptions-and-metada | ||
# Packages Used: ggplot2 | ||
|
||
# All source code is copyright (c) 2011, under the Simplified BSD License. | ||
# All source code is copyright (c) 2012, under the Simplified BSD License. | ||
# For more information on FreeBSD see: http://www.opensource.org/licenses/bsd-license.php | ||
|
||
# All images and materials produced by this code are licensed under the Creative Commons | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,14 @@ | ||
# File-Name: email_classify.R | ||
# Date: 2011-11-01 | ||
# Author: Drew Conway ([email protected]) and John Myles White ([email protected]) | ||
# Date: 2012-02-10 | ||
# Author: Drew Conway ([email protected]) and John Myles White ([email protected]) | ||
# Purpose: Code for Chapter 3. In this case we introduce the notion of binary classification. | ||
# In machine learning this is a method for determining what of two categories a | ||
# given observation belongs to. To show this, we will create a simple naive Bayes | ||
# classifier for SPAM email detection, and visualize the results. | ||
# Data Used: Email messages contained in data/ directory, source: http://spamassassin.apache.org/publiccorpus/ | ||
# Packages Used: tm, ggplot2 | ||
|
||
# All source code is copyright (c) 2011, under the Simplified BSD License. | ||
# All source code is copyright (c) 2012, under the Simplified BSD License. | ||
# For more information on FreeBSD see: http://www.opensource.org/licenses/bsd-license.php | ||
|
||
# All images and materials produced by this code are licensed under the Creative Commons | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
# File-Name: priority_inbox.R | ||
# Date: 2011-11-01 | ||
# Date: 2012-02-10 | ||
# Author: Drew Conway ([email protected]) and John Myles White ([email protected]) | ||
# Purpose: Code for Chapter 4. In this case study we will attempt to write a "priority | ||
# inbox" algorithm for ranking email by some measures of importance. We will | ||
|
@@ -9,7 +9,7 @@ | |
# source: http://spamassassin.apache.org/publiccorpus/ | ||
# Packages Used: tm, ggplot2 | ||
|
||
# All source code is copyright (c) 2011, under the Simplified BSD License. | ||
# All source code is copyright (c) 2012, under the Simplified BSD License. | ||
# For more information on FreeBSD see: http://www.opensource.org/licenses/bsd-license.php | ||
|
||
# All images and materials produced by this code are licensed under the Creative Commons | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,22 @@ | ||
# File-Name: chapter05.R | ||
# Date: 2012-02-10 | ||
# Author: Drew Conway ([email protected]) and John Myles White ([email protected]) | ||
# Purpose: | ||
# Data Used: data/longevity.csv | ||
# Packages Used: ggplot2 | ||
|
||
# All source code is copyright (c) 2012, under the Simplified BSD License. | ||
# For more information on FreeBSD see: http://www.opensource.org/licenses/bsd-license.php | ||
|
||
# All images and materials produced by this code are licensed under the Creative Commons | ||
# Attribution-Share Alike 3.0 United States License: http://creativecommons.org/licenses/by-sa/3.0/us/ | ||
|
||
# All rights reserved. | ||
|
||
# NOTE: If you are running this in the R console you must use the 'setwd' command to set the | ||
# working directory for the console to whereever you have saved this file prior to running. | ||
# Otherwise you will see errors when loading data or saving figures! | ||
|
||
library('ggplot2') | ||
|
||
# First snippet | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,22 @@ | ||
# File-Name: chapter06.R | ||
# Date: 2012-02-10 | ||
# Author: Drew Conway ([email protected]) and John Myles White ([email protected]) | ||
# Purpose: | ||
# Data Used: data/oreilly.csv | ||
# Packages Used: ggplot2, glmnet, tm, boot | ||
|
||
# All source code is copyright (c) 2012, under the Simplified BSD License. | ||
# For more information on FreeBSD see: http://www.opensource.org/licenses/bsd-license.php | ||
|
||
# All images and materials produced by this code are licensed under the Creative Commons | ||
# Attribution-Share Alike 3.0 United States License: http://creativecommons.org/licenses/by-sa/3.0/us/ | ||
|
||
# All rights reserved. | ||
|
||
# NOTE: If you are running this in the R console you must use the 'setwd' command to set the | ||
# working directory for the console to whereever you have saved this file prior to running. | ||
# Otherwise you will see errors when loading data or saving figures! | ||
|
||
library('ggplot2') | ||
|
||
# First snippet | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,22 @@ | ||
# File-Name: chapter07.R | ||
# Date: 2012-02-10 | ||
# Author: Drew Conway ([email protected]) and John Myles White ([email protected]) | ||
# Purpose: | ||
# Data Used: data/01_heights_weights_genders.csv, data/lexical_database.Rdata | ||
# Packages Used: n/a | ||
|
||
# All source code is copyright (c) 2012, under the Simplified BSD License. | ||
# For more information on FreeBSD see: http://www.opensource.org/licenses/bsd-license.php | ||
|
||
# All images and materials produced by this code are licensed under the Creative Commons | ||
# Attribution-Share Alike 3.0 United States License: http://creativecommons.org/licenses/by-sa/3.0/us/ | ||
|
||
# All rights reserved. | ||
|
||
# NOTE: If you are running this in the R console you must use the 'setwd' command to set the | ||
# working directory for the console to whereever you have saved this file prior to running. | ||
# Otherwise you will see errors when loading data or saving figures! | ||
|
||
# First code snippet | ||
height.to.weight <- function(height, a, b) | ||
{ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,22 @@ | ||
# File-Name: chapter08.R | ||
# Date: 2012-02-10 | ||
# Author: Drew Conway ([email protected]) and John Myles White ([email protected]) | ||
# Purpose: | ||
# Data Used: data/DJI.csv, data/stock_prices.csv | ||
# Packages Used: ggplot2, lubridate, reshape | ||
|
||
# All source code is copyright (c) 2012, under the Simplified BSD License. | ||
# For more information on FreeBSD see: http://www.opensource.org/licenses/bsd-license.php | ||
|
||
# All images and materials produced by this code are licensed under the Creative Commons | ||
# Attribution-Share Alike 3.0 United States License: http://creativecommons.org/licenses/by-sa/3.0/us/ | ||
|
||
# All rights reserved. | ||
|
||
# NOTE: If you are running this in the R console you must use the 'setwd' command to set the | ||
# working directory for the console to whereever you have saved this file prior to running. | ||
# Otherwise you will see errors when loading data or saving figures! | ||
|
||
library('ggplot2') | ||
|
||
# First code snippet | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
# File-Name: senate_mds.R | ||
# Date: 2011-11-01 | ||
# Date: 2012-02-10 | ||
# Author: Drew Conway ([email protected]) and John Myles White ([email protected]) | ||
# Purpose: Code for Chapter 4. In this case study we introduce multidimensional scaling (MDS), | ||
# a technique for visually displaying the simialrity of observations in | ||
|
@@ -9,7 +9,7 @@ | |
# Data Used: *.dta files in code/data/, source: http://www.voteview.com/dwnl.htm | ||
# Packages Used: foreign, ggplot2 | ||
|
||
# All source code is copyright (c) 2011, under the Simplified BSD License. | ||
# All source code is copyright (c) 2012, under the Simplified BSD License. | ||
# For more information on FreeBSD see: http://www.opensource.org/licenses/bsd-license.php | ||
|
||
# All images and materials produced by this code are licensed under the Creative Commons | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,23 @@ | ||
# File-Name: chapter10.R | ||
# Date: 2012-02-10 | ||
# Author: Drew Conway ([email protected]) and John Myles White ([email protected]) | ||
# Purpose: | ||
# Data Used: data/example.csv, data/installations.csv | ||
# Packages Used: class, reshape | ||
|
||
# All source code is copyright (c) 2012, under the Simplified BSD License. | ||
# For more information on FreeBSD see: http://www.opensource.org/licenses/bsd-license.php | ||
|
||
# All images and materials produced by this code are licensed under the Creative Commons | ||
# Attribution-Share Alike 3.0 United States License: http://creativecommons.org/licenses/by-sa/3.0/us/ | ||
|
||
# All rights reserved. | ||
|
||
# NOTE: If you are running this in the R console you must use the 'setwd' command to set the | ||
# working directory for the console to whereever you have saved this file prior to running. | ||
# Otherwise you will see errors when loading data or saving figures! | ||
|
||
|
||
# First code snippet | ||
df <- read.csv('data/example_data.csv') | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
# File-Name: google_sg.R | ||
# Date: 2012-01-19 | ||
# Date: 2012-02-10 | ||
# Author: Drew Conway ([email protected]) and John Myles White ([email protected]) | ||
# Purpose: File 1 for code from Chapter 11. This file contains a set of functions for building | ||
# igraph network object from the Twitter social graphs. As the initial set of code | ||
|
@@ -9,7 +9,7 @@ | |
# Data Used: Accessed via the Google SocialGraph API, source: http://code.google.com/apis/socialgraph/ | ||
# Packages Used: igraph, RCurl, RJSONIO | ||
|
||
# All source code is copyright (c) 2011, under the Simplified BSD License. | ||
# All source code is copyright (c) 2012, under the Simplified BSD License. | ||
# For more information on FreeBSD see: http://www.opensource.org/licenses/bsd-license.php | ||
|
||
# All images and materials produced by this code are licensed under the Creative Commons | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
# File-Name: twitter_net.R | ||
# Date: 2011-11-01 | ||
# Date: 2012-02-10 | ||
# Author: Drew Conway ([email protected]) and John Myles White ([email protected]) | ||
# Purpose: File 2 for code in Chapter 11. In this short file we write code for generating the | ||
# the ego-network for a given Twitter user. Once the network object has been built we | ||
|
@@ -9,7 +9,7 @@ | |
# Data Used: n/a | ||
# Packages Used: igraph, see 01_google_sg.R | ||
|
||
# All source code is copyright (c) 2011, under the Simplified BSD License. | ||
# All source code is copyright (c) 2012, under the Simplified BSD License. | ||
# For more information on FreeBSD see: http://www.opensource.org/licenses/bsd-license.php | ||
|
||
# All images and materials produced by this code are licensed under the Creative Commons | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
# File-Name: twitter_rec.R | ||
# Date: 2011-11-01 | ||
# Date: 2012-02-10 | ||
# Author: Drew Conway ([email protected]) and John Myles White ([email protected]) | ||
# Purpose: File 3 for code in Chapter 9. In the final piece of this case study we design a | ||
# simple social graph reccommendation system based on Twitter data. Using the | ||
|
@@ -10,7 +10,7 @@ | |
# Data Used: data/*.graphml | ||
# Packages Used: igraph | ||
|
||
# All source code is copyright (c) 2011, under the Simplified BSD License. | ||
# All source code is copyright (c) 2012, under the Simplified BSD License. | ||
# For more information on FreeBSD see: http://www.opensource.org/licenses/bsd-license.php | ||
|
||
# All images and materials produced by this code are licensed under the Creative Commons | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,23 @@ | ||
# File-Name: chapter12.R | ||
# Date: 2012-02-10 | ||
# Author: Drew Conway ([email protected]) and John Myles White ([email protected]) | ||
# Purpose: | ||
# Data Used: data/df.csv, dtm.RData | ||
# Packages Used: ggplot2, glmnet, tm, boot | ||
|
||
# All source code is copyright (c) 2012, under the Simplified BSD License. | ||
# For more information on FreeBSD see: http://www.opensource.org/licenses/bsd-license.php | ||
|
||
# All images and materials produced by this code are licensed under the Creative Commons | ||
# Attribution-Share Alike 3.0 United States License: http://creativecommons.org/licenses/by-sa/3.0/us/ | ||
|
||
# All rights reserved. | ||
|
||
# NOTE: If you are running this in the R console you must use the 'setwd' command to set the | ||
# working directory for the console to whereever you have saved this file prior to running. | ||
# Otherwise you will see errors when loading data or saving figures! | ||
|
||
|
||
library('ggplot2') | ||
|
||
# First code snippet | ||
|