Crawler and scraper for news archive on reuters.com.
It goes through pages with links to articles of the news archive overall, or for a given section, and take article headlines, article text and time stamp of release and put it to a MongoDB collection.
the framework of scrapy was used to create the spider.
The documentation and development is still being developed.