Skip to content

Latest commit

 

History

History
 
 

benchmarks

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Benchmarks

In this folder we show benchmarks using different algorithms. To facilitate the benchmark computation, we provide a set of wrapper functions that can be found in the file benchmark_utils.py.

The machine we used to perform the benchmarks is a Standard NC6s_v2 Azure DSVM (6 vCPUs, 112 GB memory and 1 P100 GPU). Spark ALS is run in local standalone mode.

MovieLens

MovieLens is one of the most common datasets used in the literature in Recommendation Systems. The dataset consists of a collection of users, movies and movie ratings, there are several available sizes:

  • MovieLens 100k: 100,000 ratings from 1000 users on 1700 movies.
  • MovieLens 1M: 1 million ratings from 6000 users on 4000 movies.
  • MovieLens 10M: 10 million ratings from 72000 users on 10000 movies.
  • MovieLens 20M: 20 million ratings from 138000 users on 27000 movies

The MovieLens benchmark can be seen at movielens.ipynb. In this notebook, the MovieLens dataset is split into training / test sets using a stratified splitting method that takes 75% of each user's ratings as training data, and the remaining 25% ratings as test data. For ranking metrics we use k=10 (top 10 recommended items). The algorithms used in this benchmark are ALS, SVD, SAR, NCF, BPR and FastAI.