Skip to content

JayneJacobs/webscraping

Repository files navigation

Webscraping

Always find out what is allowed: using robots.txt

https://www.udemy.com/robots.txt

http://go-colly.org/

https://docs.google.com/document/d/12a1zW0gCev2EERQKM-VuQbSgfbm9P5aRi6p8paPr-s8/edit#

Question and answer css class names? wow! colly is free!?! did this get all of the tweets? anything else?

Procedure

go get -u github.com/gocolly/colly/...

Puerkitobio/goquery search library built on top of an xhtml package

check allowed scraping

https://technicalseo.com/tools/robots-txt/

  1. Insert the desired URL in the URL field
  2. Past the sites robots.txt file
  3. pulldown to all robots

git add .; git commit -m "Adding Reddit Scraper and counter"; git push; git status

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages