simple_crawler

Very simple Webcrawler written in python for learning purposes.

It get some seed pages, scan for all URL's and store then in a SQLITE3 database, counting how many references for each URL.

Running

Just do: $ python webcrawler.py

Edit the webcrawler.py file changing the URL and html file storage:

seeds = [{'page': 'http://www.someurlsite.com', 'file': 'html/someurlsite.html'},...]

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
perl		perl
util		util
.gitignore		.gitignore
README.md		README.md
categorizer.py		categorizer.py
find_links.py		find_links.py
get_page.py		get_page.py
html_unescape.py		html_unescape.py
sqlite3sheel.py		sqlite3sheel.py
url_store.py		url_store.py
webcrawler.py		webcrawler.py
wordcount.py		wordcount.py