Recommender systems for arXivDigest.
The author details and paper metadata used by the recommender systems are retrieved from the Semantic Scholar API.
Recommender system | Module | Class |
---|---|---|
Frequent Venues | frequent_venues.py |
FrequentVenuesRecommender |
Venue Co-Publishing | venue_copub.py |
VenueCoPubRecommender |
Weighted Influence | weighted_inf.py |
WeightedInfRecommender |
Previously Cited | prev_cited.py |
PrevCitedRecommender |
Previously Cited by Collaborators | prev_cited_collab.py |
PrevCitedCollabRecommender |
Previously Cited and Topic Search | prev_cited_topic.py |
PrevCitedTopicSearchRecommender |
It is not uncommon for researchers to publish numerous papers at the same venue over time. This recommender is based on the assumption that a paper published at a venue that a user frequently publishes at is more relevant to the user than other papers.
This recommender is based on the assumption that a paper's relevance to a user is tied to the degree of venue co-publishing between the paper's authors and the user: a paper is relevant to a user if the authors of the paper publish at the same venues as the user.
This recommender is similar to the Venue Co-Publishing recommender. It does not only look at the co-publishing patterns of the user and the authors of a paper, but also takes into consideration the influential citation counts of the authors.
This recommender recommends papers that are published by authors that the user has previously cited.
This recommender is similar to the Previously Cited recommender, but instead of looking at whether the user has cited the authors of a paper, it looks at whether the user's previous collaborators have done so.
This recommender combines Previously Cited with the approach of the base arXivDigest recommender system, which queries an Elasicsearch index containing the candidate papers for the user's topics of interest.
- Python 3.6+
- MongoDB or Redis — Used to cache responses from the Semantic Scholar API (can be disabled)
- Elasticsearch — Used by the Previously Cited and Topic Search recommender for topic search
Install the arxivdigest_recommenders
package and its dependencies with pip install -e .
. The -e
flag makes the installation editable.
Install the arxivdigest_recommenders
package directly from the master branch of this repository:
pip install git+https://github.com/olafapl/arxivdigest_recommenders.git
Updates can be installed with:
pip install --upgrade git+https://github.com/olafapl/arxivdigest_recommenders.git
The different recommenders can be run directly by running the modules containing their implementation. As an example, the Frequent Venues recommender can be run by executing python -m arxivdigest_recommenders.frequent_venues
.
The Semantic Scholar API rate limit defined in the config file (or the default one of 100 requests per five minute window) works only on a per-process basis, meaning that if two recommenders are run at the same time using the aforementioned method, the effective rate limit will be double that of what we expect. To avoid this problem, run the recommenders in the same process:
import asyncio
from arxivdigest_recommenders.frequent_venues import FrequentVenuesRecommender
from arxivdigest_recommenders.venue_copub import VenueCoPubRecommender
async def main():
fv = FrequentVenuesRecommender()
vc = VenueCoPubRecommender()
await asyncio.gather(*[fv.recommend(), vc.recommend()])
asyncio.run(main())
It is possible to override the default settings of the recommender systems by creating a config file in one of the following locations:
~/arxivdigest-recommenders/config.json
/etc/arxivdigest-recommenders/config.json
%cwd%/config.json
arxivdigest_base_url
mongodb
host
port
redis
:host
port
elasticsearch
host
port
semantic_scholar
: Semantic Scholar API configapi_key
max_concurrent_requests
: max number of concurrent requestsmax_requests
: max number of requests per windowwindow_size
: window size in secondscache_responses
: enable/disable caching completelycache_backend
: either "mongodb" or "redis"mongodb_db
: MongoDB database used for cachingmongodb_collection
: MongoDB database used for cachingpaper_cache_expiration
: expiration time (in days) for paper dataauthor_cache_expiration
: expiration time (in days) for author data
max_paper_age
: papers older than this (in years) are filtered out when looking at an author's published papersmax_explanation_venues
: max number of venues to include in explanations (used by the Venue Co-Publishing and Weighted Influence recommenders)venue_blacklist
: (case-insensitive) list of venues to ignorefrequent_venues_recommender
: Frequent Venues recomender configarxivdigest_api_key
venue_copub_recommender
: Venue Co-Publishing recommender configarxivdigest_api_key
weighted_inf_recommender
: Weighted Influence recomender configarxivdigest_api_key
min_influence
: minimum influential citation count for authors
prev_cited_recommender
: Previously Cited recomender configarxivdigest_api_key
prev_cited_collab_recommender
: Previously Cited by Collaborators recomender configarxivdigest_api_key
prev_cited_topic_recommender
: Previously Cited and Topic Search recomender configarxivdigest_api_key
index
: Elasticsearch index for candidate paper indexing and topic searchmax_explanation_topics
: max number of topics to include in explanations
log_level
: either "FATAL", "ERROR", "WARNING", "INFO", or "DEBUG"
{
"arxivdigest_base_url": "https://api.arxivdigest.org/",
"mongodb": {
"host": "127.0.0.1",
"port": 27017
},
"redis": {
"host": "127.0.0.1",
"port": 6379
},
"elasticsearch": {
"host": "127.0.0.1",
"port": 9200
},
"semantic_scholar": {
"api_key": null,
"max_concurrent_requests": 100,
"max_requests": 100,
"window_size": 300,
"cache_responses": true,
"cache_backend": "redis",
"mongodb_db": "s2cache",
"mongodb_collection": "s2cache",
"paper_cache_expiration": 30,
"author_cache_expiration": 7
},
"max_paper_age": 5,
"max_explanation_venues": 3,
"venue_blacklist": ["arxiv"],
"frequent_venues_recommender": {
"arxivdigest_api_key": null
},
"venue_copub_recommender": {
"arxivdigest_api_key": null
},
"weighted_inf_recommender": {
"arxivdigest_api_key": null,
"min_influence": 20
},
"prev_cited_recommender": {
"arxivdigest_api_key": null
},
"prev_cited_collab_recommender": {
"arxivdigest_api_key": null
},
"prev_cited_topic_recommender": {
"arxivdigest_api_key": null,
"index": "arxivdigest_papers",
"max_explanation_topics": 3
},
"log_level": "INFO"
}