Skip to content

joaopauloribeiro/simple_crawler

Repository files navigation

simple_crawler

Very simple Webcrawler written in python for learning purposes.

It get some seed pages, scan for all URL's and store then in a SQLITE3 database, counting how many references for each URL.

Running

Just do: $ python webcrawler.py

Chaging the URL seeds

Edit the webcrawler.py file changing the URL and html file storage:

seeds = [{'page': 'http://www.someurlsite.com', 'file': 'html/someurlsite.html'},...]

About

Simple Webcrawler written in python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published