Skip to content

Commit

Permalink
definitive api
Browse files Browse the repository at this point in the history
  • Loading branch information
Paul Asmuth committed Jan 29, 2012
1 parent 116ea99 commit 2f68d08
Showing 1 changed file with 39 additions and 34 deletions.
73 changes: 39 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,24 @@ recommendify

Incremental and distributed item-based "Collaborative Filtering" for binary ratings with ruby and redis. In a nutshell: You feed in `user -> item` interactions and it spits out similarity vectors between items ("related items").


### use cases

+ "Users that bought this product also bought...".
+ "Users that viewed this video also viewed...".
+ "Users that follow this person also follow...".

etc.



### how it works

Recommendify keeps an incrementally updated `item x item` matrix, the "co-concurrency matrix". This matrix stores the number of times that a combination of two items has appeared in an interaction/preferrence set. The co-concurrence counts are processed with a consine distance/similarity measure to retrieve another `item x item` similarity matrix, which is used to find the N most similar items for each item. This approach was described by Miranda, Alipio et al. [6]
Recommendify keeps an incrementally updated `item x item` matrix, the "co-concurrency matrix". This matrix stores the number of times that a combination of two items has appeared in an interaction/preferrence set. The co-concurrence counts are processed with a similarity measure to retrieve another `item x item` similarity matrix, which is used to find the N most similar items for each item. This approach was described by Miranda, Alipio et al. [1]

1. Group the input user->item pairs by user-id and store them into interaction sets
2. For each item<->item combination in the interaction increment the respective element in the co-concurrence matrix
3. For each item<->item combination in the co-concurrence matrix calculate (i,j): sim(i, j) = #(I∩J) / √#I×√#J
2. For each item<->item combination in the interaction set increment the respective element in the co-concurrence matrix
3. For each item<->item combination in the co-concurrence matrix calculate the jaccard similarity
3. For each item store the N most similar items in the respective output set.


Expand All @@ -28,42 +31,48 @@ Fnord is not a draft!
usage
-----

You can add new interaction-sets to the processor incrementally, but the similarity matrix has to be manually re-processed after new interactions were added to any of the applied processors. However, the processing happens on-line and you can keep track of the changed items so you only have to re-calculate the changed rows of the matrix.

```ruby

class RecommendedItem < Recommendify::Base
# Our similarity matrix, we calculate the similarity via co-concurrence
# of items in "orders" and the co-concurrence of items in user-likes using
# two `item x item` matrices and the jaccard similarity measure.
class RecommendedItem < Recommendify::SimilarityMatrix

# calculate a maximum of fifty neighbors per item
# store a maximum of fifty neighbors per item
max_neighbors 50

# truncate user/actor rows to 200 items
max_coconcurrent 200
processor :order_item_sim,
:similarity_func => :jaccard
:weight => 5.0

processor :like_item_sim,
:similarity_func => :jaccard
:weight => 1.0

end

# clear the similarity matrix
RecommendedItem.delete!
# add `order_id->item_id` interactions to the processor (incremental)
order_item_processor = RecommendedItem.order_item_sim
order_item_processor.add_set("order1", ["item23", "item65", "item23"])
order_item_processor.add_set("order2", ["item14", "item23"])

# ...or degrade the similarity matrix by 0.5
RecommendedItem.degrade!(0.5)
# add `user_id->item_id` interactions to the processor (incremental)
like_item_processor = RecommendedItem.like_item_sim
like_item_processor.add_set("user1", ["item23", "item65", "item23"])
like_item_processor.add_set("user2", ["item14", "item23"])

# process the matrix (distributed)
m.process!
# Calculate all elements of the similarity matrix
RecommendedItem.process!

# delete the co-concurrence matrix
m.delete!
# ...or calculate a specific row of the similarity matrix (a specific item)
RecommendedItem.process_item!("item65")

# retrieve similar items to "item54"
RecommendedItem.for("item54")
# retrieve similar items to "item23"
RecommendedItem.for("item23")
=> [ <RecommendedItem item_id:"item65" similarity:0.23>, (...) ]

# get the similarity of a result
RecommendedItem.for("item54").similarity
=> 0.23

# get the item_id of a result
RecommendedItem.for("item54").item_id
=> "item65"

```


Expand All @@ -81,23 +90,19 @@ ideas
-----

+ rake benchmark CLASS=MySimilarityProcessor

+ NativeJaccardProcessor
+ CosineProcessor


Sources / References
--------------------

[1] Sawar B., G. Karypis, Konstan J. and Riedel J. (2001). Item-Based Collaborative Filtering Recommendation Algorithms (GroupLens Research Group/Army HPC Research Center, University of Minnesota)

[2] Herlocker J., Konstan J. Terveen L. and Ried J. (2004) Evaluating Collaborative Filtering Recommender Systems (GroupLens Research Group/Army HPC Research Center, University of Minnesota)

[3] Schafer J.B, Konstan J. and Riedl J. (1999). Recommender Systems in E-Commerce (Proceedings of ACM E-Commerce 1999 conference)
[1] Miranda C. and Alipio J. (2008). Incremental collaborative filtering for binary ratings (LIAAD - INESC Porto, University of Porto)

[4] Adomavicius G. and Tuzhilin A. (2005). Towards the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions (IEEE Transactions on knowledge and data engineering)
[2] George Karypis (2000) Evaluation of Item-Based Top-N Recommendation Algorithms (University of Minnesota, Department of Computer Science / Army HPC Research Center)

[5] Grcar M, Fortuna B. and Mladenic D. (2005). kNN Versus SVM in the Collaborative Filtering Framework (Jozef Stefan Institut, IST Programme of the European Community)
[3] Shiwei Z., Junjie W. Hui X. and Guoping X. (2011) Scaling up top-K cosine similarity search (Data & Knowledge Engineering 70)

[6] Miranda C. and Alipio J. (2008). Incremental collaborative filtering for binary ratings (LIAAD - INESC Porto, University of Porto)


License
Expand Down

0 comments on commit 2f68d08

Please sign in to comment.