Skip to content

Commit

Permalink
v2.0.0
Browse files Browse the repository at this point in the history
  • Loading branch information
jhdavids8 committed Apr 17, 2014
1 parent d21ea6c commit 5ff4754
Show file tree
Hide file tree
Showing 15 changed files with 757 additions and 506 deletions.
13 changes: 13 additions & 0 deletions Changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
=======
Predictor Changelog
=========
2.0.0 (2014-04-17)
---------------------
**Rewrite of 1.0.0 and contains several breaking changes!**

Version 1.0.0 (which really should have been 0.0.1) contained several issues that made compatability with v2 not worth the trouble. This includes:
* In v1, similarities were cached per input_matrix, and Predictor::Base utilized those caches when determining similarities and predictions. This quickly ate up Redis memory with even a semi-large dataset, as each input_matrix had a significant memory requirement. v2 caches similarities at the root (Recommender::Base), which means you can add any number of input matrices with little impact on memory usage.
* Added the ability to limit the number of items stored in the similarity cache (via the 'limit_similarities_to' option). Now that similarities are cached at the root, this is possible and can greatly help memory usage.
* Removed bang methods from input_matrix (add_set!, and_single!, etc). These called process! for you previously, but since the cache is no longer kept at the input_matrix level, process! has to be called at the root (Recommender::Base)
* Bug fix: Fixed bug where a call to delete_item! on the input matrix didn't update the similarity cache.
* Other minor fixes.
11 changes: 2 additions & 9 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,3 @@
source :rubygems
source 'https://rubygems.org'

gem "redis"

group :development do
gem "rake"
gem "rspec"
gem "yard"
gem "pry"
end
gemspec
115 changes: 72 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Predictor
=========

Fast and efficient recommendations and predictions using Ruby & Redis. Used in production over at [Pathgather](http://pathgather.com) to generate course similarities and content recommendations to users.
Fast and efficient recommendations and predictions using Ruby & Redis. Developed by and used at [Pathgather](http://pathgather.com) to generate course similarities and content recommendations to users.

![](https://www.codeship.io/projects/5aeeedf0-6053-0131-2319-5ede98f174ff/status)

Expand All @@ -11,21 +11,17 @@ Originally forked and based on [Recommendify](https://github.com/paulasmuth/reco
* Provide item similarities such as "Users that read this book also read ..."
* Provide personalized predictions based on a user's past history, such as "You read these 10 books, so you might also like to read ..."

At the moment, Predictor uses the [Jaccard index](http://en.wikipedia.org/wiki/Jaccard_index) to determine similarities between items. There are other ways to do this, which we intend to implement eventually, but if you want to beat us to the punch, pull requests are quite welcome :)
At the moment, Predictor uses the [Jaccard index](http://en.wikipedia.org/wiki/Jaccard_index) or the [Sorenson-Dice coefficient](http://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) (default is Jaccard) to determine similarities between items. There are other ways to do this, which we intend to implement eventually, but if you want to beat us to the punch, pull requests are quite welcome :)

Version Note
Notice
---------------------
These are the docs for Predictor 1. Predictor 2 is currently in development and can be found [on the 2.0.0.rc1 branch](https://github.com/Pathgather/predictor/tree/2.0.0.rc1). If you are new to Predictor, I encourage you to use that branch. If you are on Predictor 1, I also encourage you to upgrade, but please note that there are a few breaking changes, which are noted in the Readme for Predictor 2.
This is the readme for Predictor 2.0, which contains a few breaking changes from 1.0. The 1.0 readme can be found [here](https://github.com/Pathgather/predictor/blob/master/docs/READMEv1.md). See below on how to upgrade to 2.0

Installation
---------------------
In your Gemfile:
```
gem 'predictor', '~> 1.0'
```
To use [Predictor 2](https://github.com/Pathgather/predictor/tree/2.0.0.rc1) (make sure you follow [this Readme](https://github.com/Pathgather/predictor/tree/2.0.0.rc1) if you are)
```
gem 'predictor', '2.0.0.rc1'
```ruby
gem 'predictor'
```
Getting Started
---------------------
Expand All @@ -37,6 +33,7 @@ Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"])
# Or, to improve performance, add hiredis as your driver (you'll need to install the hiredis gem first)
Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"], :driver => :hiredis)
```

Inputting Data
---------------------
Create a class and include the Predictor::Base module. Define an input_matrix for each relationship you'd like to keep track of. This can be anything you think is a significant metric for the item: page views, purchases, categories the item belongs to, etc.
Expand All @@ -56,9 +53,10 @@ Below, we're building a recommender to recommend courses based off of:
class CourseRecommender
include Predictor::Base

limit_similarities_to 500 # Optional, but if specified, Predictor only caches the top x similarities for an item at any given time. Can greatly help with efficient use of Redis memory
input_matrix :users, weight: 3.0
input_matrix :tags, weight: 2.0
input_matrix :topics, weight: 1.0
input_matrix :topics, weight: 1.0, measure: :sorensen_coefficient # Use Sorenson over Jaccard
end
```

Expand All @@ -67,37 +65,21 @@ Now, we just need to update our matrices when courses are created, users take a
recommender = CourseRecommender.new

# Add a single course to topic-1's items. If topic-1 already exists as a set ID, this just adds course-1 to the set
recommender.topics.add_single!("topic-1", "course-1")

# If your matrix is quite large, add_single! could take some time, as it must calculate the similarity scores
# for course-1 across all other courses. If this is the case, use add_single and process the item at a more
# convenient time, perhaps in a background job
recommender.topics.add_single("topic-1", "course-1")
recommender.topics.process_item!("course-1")
recommender.add_to_matrix!(:topics, "topic-1", "course-1")

# Add an array of courses to tag-1. Again, these will simply be added to tag-1's existing set, if it exists.
# If not, the tag-1 set will be initialized with course-1 and course-2
recommender.tags.add_set!("tag-1", ["course-1", "course-2"])

# Or, just add the set and process whenever you like
recommender.tags.add_set("tag-1", ["course-1", "course-2"])
["course-1", "course-2"].each { |course| recommender.topics.process_item!(course) }
# If your dataset is even remotely large, add_to_matrix! could take some time, as it must calculate the similarity scores
# for course-1 and other courses that share a set with course-1. If this is the case, use add_to_matrix and
# process the items at a more convenient time, perhaps in a background job
recommender.topics.add_to_set("topic-1", "course-1", "course-2") # Same as recommender.add_to_matrix(:topics, "topic-1", "course-1", "course-2")
recommender.process_items!("course-1", "course-2")
```

As noted above, it's important to remember that if you don't use the bang methods (add_set! and add_single!), you'll need to manually update your similarities (the bang methods will likely suffice for most use cases though). You can do so a variety of ways.
* If you want to simply update the similarities for a single item in a specific matrix:
````
recommender.matrix.process_item!(item)
As noted above, it's important to remember that if you don't use the bang method 'add_to_matrix!', you'll need to manually update your similarities. If your dataset is even remotely large, you'll probably want to do this:
* If you want to update the similarities for certain item(s):
````
* If you want to update the similarities for all items in a specific matrix:
recommender.process_items!(item1, item2, etc)
````
recommender.matrix.process!
````
* If you want to update the similarities for a single item in all matrices:
````
recommender.process_item!(item)
````
* If you want to update all similarities in all matrices:
* If you want to update all similarities for all items:
````
recommender.process!
````
Expand Down Expand Up @@ -140,18 +122,18 @@ recommender.predictions_for("user-1", matrix_label: :users)
# Paginate too!
recommender.predictions_for("user-1", matrix_label: :users, offset: 10, limit: 10)

# Gimme some scores and ignore user-2....that user-2 is one sketchy fella
recommender.predictions_for("user-1", matrix_label: :users, with_scores: true, exclusion_set: ["user-2"])
# Gimme some scores and ignore course-2....that course-2 is one sketchy fella
recommender.predictions_for("user-1", matrix_label: :users, with_scores: true, exclusion_set: ["course-2"])
```

Deleting Items
---------------------
If your data is deleted from your persistent storage, you certainly don't want to recommend that data to a user. To ensure that doesn't happen, simply call delete_item! on the individual matrix or recommender as a whole:
If your data is deleted from your persistent storage, you certainly don't want to recommend it to a user. To ensure that doesn't happen, simply call delete_from_matrix! with the individual matrix or delete_item! if the item is completely gone:
```ruby
recommender = CourseRecommender.new

# User removed course-1 from topic-1, but course-1 still exists
recommender.topics.delete_item!("course-1")
recommender.delete_from_matrix!(:topics, "course-1")

# course-1 was permanently deleted
recommender.delete_item!("course-1")
Expand All @@ -160,6 +142,54 @@ recommender.delete_item!("course-1")
recommender.clean!
```

Limiting Similarities
---------------------
By default, Predictor caches all similarities for all items, with no limit. That means if you have 10,000 items, and each item is somehow related to the other, we'll have 10,000 sets each with 9,999 items. That's going to use Redis' memory quite quickly. To limit this, specify the limit_similarities_to option.
```ruby
class CourseRecommender
include Predictor::Base

limit_similarities_to 500
input_matrix :users, weight: 3.0
input_matrix :tags, weight: 2.0
input_matrix :topics, weight: 1.0
end
```

This can really save a ton of memory. Just remember though, predictions fetched with the predictions_for call utilzes the similarity caches, so if you're using predictions_for, make sure you set the limit high enough so that intelligent predictions can be generated. If you aren't using predictions and are just using similarities, then feel free to set this to the maximum number of similarities you'd possibly want to show!

Upgrading from 1.0 to 2.0
---------------------
As mentioned, 2.0.0 is quite a bit different than 1.0.0, so simply upgrading with no changes likely won't work. My apologies for this. I promise this won't happen in future releases, as I'm much more confident in this Predictor release than the last. Anywho, upgrading really shouldn't be that much of a pain if you follow these steps:

* Change predictor.matrix.add_set! and predictor.matrix.add_single! calls to predictor.add_to_matrix!. For example:
```ruby
# Change
predictor.topics.add_single!("topic-1", "course-1")
# to
predictor.add_to_matrix!(:topics, "topic-1", "course-1")

# Change
predictor.tags.add_set!("tag-1", ["course-1", "course-2"])
# to
predictor.add_to_matrix!(:tags, "tag-1", "course-1", "course-2")
```
* Change predictor.matrix.process! or predictor.matrix.process_item! calls to just predictor.process! or predictor.process_items!
```ruby
# Change
predictor.topics.process_item!("course-1")
# to
predictor.process_items!("course-1")
```
* Change predictor.matrix.delete_item! calls to predictor.delete_from_matrix!. This will update similarities too, so you may want to queue this to run in a background job.
```ruby
# Change
predictor.topics.delete_item!("course-1")
# to delete_from_matrix! if you want to update similarities to account for the deleted item (in v1, this was a bug and didn't occur)
predictor.delete_from_matrix!(:topics, "course-1")
```
* Regenerate your recommendations, as redis keys have changed for Predictor 2. You can use the recommender.clean! to clear out old similarities, then run your rake task (or whatever you've setup) to create new similarities.

Problems? Issues? Want to help out?
---------------------
Just submit a Gihub issue or pull request! We'd love to have you help out, as the most common library to use for this need, Recommendify, was last updated 2 years ago. We'll be sure to keep this maintained, but we could certainly use your help!
Expand All @@ -183,5 +213,4 @@ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
13 changes: 1 addition & 12 deletions Rakefile
Original file line number Diff line number Diff line change
@@ -1,12 +1 @@
require "rubygems"
require "rspec"
require 'rspec/core/rake_task'
require "yard"

desc "Run all examples"
task RSpec::Core::RakeTask.new('spec')

task :default => "spec"

desc "Generate documentation"
task YARD::Rake::YardocTask.new
require "bundler/gem_tasks"
Loading

0 comments on commit 5ff4754

Please sign in to comment.