Skip to content

Scrape the preprints archive and preprocess articles

Notifications You must be signed in to change notification settings

JamesMTucker/Preprints

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Preprints archive scraper

TODO

  • implement site scraper for osf.io preprints archive
  • implement dataset class for optional PDF storage
  • PDF text extraction class
  • Text preprocessor

About

Scrape the preprints archive and preprocess articles

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published