SteamGame Recommendation is to recommend playable able based on your information.
In the case of movies and music, there're representative recommendation systems :
- Movies for Netflix
- Musics for Spotify
However, there is no representative recommendation system for games!
Therefore, we are going to make a system to recommend games.
This dataset is combination of 'Steam Video Games', and 'Steam Store Games (Clean dataset)'.
Dataset 'Steam Store Games (Clean dataset)'
- dataset Link
- This dataset combined data of 27,000 games scraped from Steam and SteamSpy APIs.
- Copyright Rule : CC BY 4.0
- It has 18 attributes.
Attribute | Type | Explanation | Example |
---|---|---|---|
appid | Nomial | Unique identifier for each title | 10, 20, 30, ... |
name | Nomial | Title of app (game) | Left 4 Dead, Dota 2, ... |
release_date | Nomial | Release date in format YYYY-MM-DD | 2008-11-17, 2009-11-19, ... |
english | Categorical | Language support: 1 if is in English | 0, 1 |
developer | Categorical | Name (or names) of developer(s). Semicolon delimited if multiple | Valve, Mark Healey, ... |
publisher | Categorical | Name (or names) of publisher(s). Semicolon delimited if multiple | Valve, Mark Healey, ... |
platforms | Categorical | Semicolon delimited list of supported platforms. At most includes: windows;mac;linux | windows, windows;mac;linux, ... |
required_age | Categorical | Minimum required age according to PEGI UK standards. Many with 0 are unrated or unsupplied. | 0, 16, 18, ... |
categories | Nomial | Semicolon delimited list of game categories, e.g. single-player;multi-player | Single-player;Multi-player, ... |
genres | Nomial | Semicolon delimited list of game genres, e.g. action;adventure | RPG, Strategy, Action;RPG, ... |
steamspy_tags | Categorical | Semicolon delimited list of top steamspy game tags, similar to genres but community voted, e.g. action;adventure | Action;FPS;Multiplayer, ... |
achievements | Discrete | Number of in-games achievements, if any | 0, 147, 54, ... |
positive_ratings | Discrete | Number of positive ratings, from SteamSpy | 124534, 3318, ... |
negative_ratings | Discrete | Number of negative ratings, from SteamSpy | 3339, 633, ... |
average_playtime | Discrete | Average user playtime, from SteamSpy | 17612, 277, 187, ... |
median_playtime | Discrete | Median user playtime, from SteamSpy | 317, 62, 34, ... |
owners | Categorical | Estimated number of owners. Contains lower and upper bound (like 20000-50000). May wish to take mid-point or lower bound. Included both to give options. | 5000000-10000000, ... |
price | Continuous | Current full price of title in GBP, (pounds sterling) | 7.19, 3.99, 5.79, ... |
Dataset 'Steam Video Games'
- dataset Link
- This dataset is for recommend video games from 200k steam user interactions.
- Copyright Rule : DbCL v1.0
- It has 4 attributes.
Attribute | Type | Explanation | Example |
---|---|---|---|
user-id | Nomial | User ID | 151603712, 187131847, ... |
game-title | Nomial | Name of the steam game | Dota 2, FINAL FANTASY XIII, ... |
behavior-name | Categorical | behavior name | purchase, play |
value | Continuous | Hours if behavior is play, 1.0 if behavior is purchase | 1.0, 9.8, 9.7, ... |
Steam Store Games (Clean dataset)
- informations of positive_ratings and negative_ratings
- Positive Rating Table
- Positive Rating Ratio
- Distributions of columns
Dataset 'Steam Video Games'
- informations of value
- Top 10 Users of value (Play-time) + What They Played
If the group is ....
- Large : Use Collaborative Filtering
- Small : Use Cotent-Based Filtering
- For avoiding long-tail problem.
Clustering
To solve the long tail problem, We divided and filtered the columns.
This is the result of the clustering.
Collaborative Filtering
There is no rating column in our data.
So, we calculated the user's rating by comparing the user's play hour with average played hour.
We calculated user's rating.
We made CF_recommend_Game function.
It gets the user id input and calculates the estimated score for each game name using svd.
This is the result of the CF function.
Content-Based Filtering
df = pd.read_csv('steam.csv')
df['rating'] = ((df['positive_ratings']
- df['negative_ratings'])
/(2 * (df['positive_ratings']
+ df['negative_ratings'])) + 0.5) * 10.0
To sort the recommendation result, We compute rating.
Column categories has lots of data.
Because They takes lots of time to handle, we use only firse 5 data.
This is the result of the Content-Based filtering.