Add archive.is as third archiving option #35

adam3smith · 2021-09-30T17:46:46Z

No description provided.

mccallc · 2022-02-07T16:54:27Z

OK, I've come to the conclusion that implementing this source is not feasible. Is there something obvious I'm missing? Please let me know if there is.

There is a now-abandoned python implementation for submitting to archive.is (last updated 2020), but trying to use it now always generates a HTTP 429 error. I ran into the same problem trying to emulate the main form submission with rvest. If you try to browse to the site manually after that, you get hit with a CAPTCHA. I think they've walled the service off pretty well from basic scrapers.

The Memento robust links API discourages use for explicit archiving, and the tool they recommend for this purpose, archivenow's archive.is handler, implements submitting collections of URLs to archive.is by manually commandeering a running instance of Firefox (?!) through a library called selenium. The program itself isn't that complex, just such weird dependencies make it pretty hostile to implementation in R.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add archive.is as third archiving option #35

Add archive.is as third archiving option #35

adam3smith commented Sep 30, 2021

mccallc commented Feb 7, 2022 •

edited

Loading

Add archive.is as third archiving option #35

Add archive.is as third archiving option #35

Comments

adam3smith commented Sep 30, 2021

mccallc commented Feb 7, 2022 • edited Loading

mccallc commented Feb 7, 2022 •

edited

Loading