Skip to content

Latest commit

 

History

History
196 lines (156 loc) · 7.42 KB

README.md

File metadata and controls

196 lines (156 loc) · 7.42 KB

arXivDigest Recommenders

Recommender systems for arXivDigest.

The author details and paper metadata used by the recommender systems are retrieved from the Semantic Scholar API.

Available Recommenders

Recommender system Module Class
Frequent Venues frequent_venues.py FrequentVenuesRecommender
Venue Co-Publishing venue_copub.py VenueCoPubRecommender
Weighted Influence weighted_inf.py WeightedInfRecommender
Previously Cited prev_cited.py PrevCitedRecommender
Previously Cited by Collaborators prev_cited_collab.py PrevCitedCollabRecommender
Previously Cited and Topic Search prev_cited_topic.py PrevCitedTopicSearchRecommender

Frequent Venues

It is not uncommon for researchers to publish numerous papers at the same venue over time. This recommender is based on the assumption that a paper published at a venue that a user frequently publishes at is more relevant to the user than other papers.

Venue Co-Publishing

This recommender is based on the assumption that a paper's relevance to a user is tied to the degree of venue co-publishing between the paper's authors and the user: a paper is relevant to a user if the authors of the paper publish at the same venues as the user.

Weighted Influence

This recommender is similar to the Venue Co-Publishing recommender. It does not only look at the co-publishing patterns of the user and the authors of a paper, but also takes into consideration the influential citation counts of the authors.

Previously Cited

This recommender recommends papers that are published by authors that the user has previously cited.

Previously Cited by Collaborators

This recommender is similar to the Previously Cited recommender, but instead of looking at whether the user has cited the authors of a paper, it looks at whether the user's previous collaborators have done so.

Previously Cited and Topic Search

This recommender combines Previously Cited with the approach of the base arXivDigest recommender system, which queries an Elasicsearch index containing the candidate papers for the user's topics of interest.

Requirements

  • Python 3.6+
  • MongoDB or Redis — Used to cache responses from the Semantic Scholar API (can be disabled)
  • Elasticsearch — Used by the Previously Cited and Topic Search recommender for topic search

Setup

Development

Install the arxivdigest_recommenders package and its dependencies with pip install -e .. The -e flag makes the installation editable.

Production

Install the arxivdigest_recommenders package directly from the master branch of this repository:

pip install git+https://github.com/olafapl/arxivdigest_recommenders.git

Updates can be installed with:

pip install --upgrade git+https://github.com/olafapl/arxivdigest_recommenders.git

Usage

Running a Single Recommender

The different recommenders can be run directly by running the modules containing their implementation. As an example, the Frequent Venues recommender can be run by executing python -m arxivdigest_recommenders.frequent_venues.

Running Multiple Recommenders

The Semantic Scholar API rate limit defined in the config file (or the default one of 100 requests per five minute window) works only on a per-process basis, meaning that if two recommenders are run at the same time using the aforementioned method, the effective rate limit will be double that of what we expect. To avoid this problem, run the recommenders in the same process:

import asyncio
from arxivdigest_recommenders.frequent_venues import FrequentVenuesRecommender
from arxivdigest_recommenders.venue_copub import VenueCoPubRecommender


async def main():
    fv = FrequentVenuesRecommender()
    vc = VenueCoPubRecommender()
    await asyncio.gather(*[fv.recommend(), vc.recommend()])


asyncio.run(main())

Configuration

It is possible to override the default settings of the recommender systems by creating a config file in one of the following locations:

  • ~/arxivdigest-recommenders/config.json
  • /etc/arxivdigest-recommenders/config.json
  • %cwd%/config.json

Structure

  • arxivdigest_base_url
  • mongodb
    • host
    • port
  • redis:
    • host
    • port
  • elasticsearch
    • host
    • port
  • semantic_scholar: Semantic Scholar API config
    • api_key
    • max_concurrent_requests: max number of concurrent requests
    • max_requests: max number of requests per window
    • window_size: window size in seconds
    • cache_responses: enable/disable caching completely
    • cache_backend: either "mongodb" or "redis"
    • mongodb_db: MongoDB database used for caching
    • mongodb_collection: MongoDB database used for caching
    • paper_cache_expiration: expiration time (in days) for paper data
    • author_cache_expiration: expiration time (in days) for author data
  • max_paper_age: papers older than this (in years) are filtered out when looking at an author's published papers
  • max_explanation_venues: max number of venues to include in explanations (used by the Venue Co-Publishing and Weighted Influence recommenders)
  • venue_blacklist: (case-insensitive) list of venues to ignore
  • frequent_venues_recommender: Frequent Venues recomender config
    • arxivdigest_api_key
  • venue_copub_recommender: Venue Co-Publishing recommender config
    • arxivdigest_api_key
  • weighted_inf_recommender: Weighted Influence recomender config
    • arxivdigest_api_key
    • min_influence: minimum influential citation count for authors
  • prev_cited_recommender: Previously Cited recomender config
    • arxivdigest_api_key
  • prev_cited_collab_recommender: Previously Cited by Collaborators recomender config
    • arxivdigest_api_key
  • prev_cited_topic_recommender: Previously Cited and Topic Search recomender config
    • arxivdigest_api_key
    • index: Elasticsearch index for candidate paper indexing and topic search
    • max_explanation_topics: max number of topics to include in explanations
  • log_level: either "FATAL", "ERROR", "WARNING", "INFO", or "DEBUG"

Defaults

{
  "arxivdigest_base_url": "https://api.arxivdigest.org/",
  "mongodb": {
    "host": "127.0.0.1",
    "port": 27017
  },
  "redis": {
    "host": "127.0.0.1",
    "port": 6379
  },
  "elasticsearch": {
    "host": "127.0.0.1",
    "port": 9200
  },
  "semantic_scholar": {
    "api_key": null,
    "max_concurrent_requests": 100,
    "max_requests": 100,
    "window_size": 300,
    "cache_responses": true,
    "cache_backend": "redis",
    "mongodb_db": "s2cache",
    "mongodb_collection": "s2cache",
    "paper_cache_expiration": 30,
    "author_cache_expiration": 7
  },
  "max_paper_age": 5,
  "max_explanation_venues": 3,
  "venue_blacklist": ["arxiv"],
  "frequent_venues_recommender": {
    "arxivdigest_api_key": null
  },
  "venue_copub_recommender":  {
    "arxivdigest_api_key": null
  },
  "weighted_inf_recommender": {
    "arxivdigest_api_key": null,
    "min_influence": 20
  },
  "prev_cited_recommender": {
    "arxivdigest_api_key": null
  },
  "prev_cited_collab_recommender": {
    "arxivdigest_api_key": null
  },
  "prev_cited_topic_recommender": {
    "arxivdigest_api_key": null,
    "index": "arxivdigest_papers",
    "max_explanation_topics": 3
  },
  "log_level": "INFO"
}