-
Notifications
You must be signed in to change notification settings - Fork 129
Changelog
Felipe Lima edited this page Mar 5, 2015
·
23 revisions
- Updates gem dependencies
- Adds
user_agent
anduser_agent_alias
config options toWombat.configure
- Updates gem dependencies
- Adds content-type=text/html header to Mechanize if missing
- Retry page.click on relative links
- Adds ability to crawl a prefetched Mechanize page (thanks to @dsjbirch)
- Added support for hash based property selectors (eg.:
css: 'header'
instead of'css=.header'
)
- Updated gem dependencies
- Added header properties (thanks to @kdridi)
-
Fixed bug in selectors that used XPath functions like
concat
(thanks to @viniciusdaniel)
- Added proxy settings configuration (thanks to @phortx)
- Fixed minor bug in HTML property locator
This version contains some breaking changes (not backwards compatible), most notably to for_each
that is now specified through the option :iterator
and nested block parameters that are gone.
- Added syntatic sugar methods
Wombat.scrape
andCrawler#scrape
that alias to their respectivecrawl
method implementation; - Gem internals suffered big refactoring, removed code duplication;
- DSL syntax simplified for nested properties. Now the nested block takes no arguments;
- DSL syntax changed for iterated properties. Iterators can now be named just like other properties and won't be automatically named as
iterator#{i}
anymore. Specified through the:iterator
option; -
Crawler#list_page
is now calledCrawler#path
; - Added new
:follow
property type that crawls links in pages.
-
Breaking change:
Metadata#format
renamed toMetadata#document_format
due to method name clash with Kernel#format
- Fixed a bug on malformed selectors
- Fixed a bug where multiple calls to #crawl would not clean up previously iterated array results and yield repeated results
- Added utility method
Wombat.crawl
that eliminates the need to have a ruby class instance to use Wombat. Now you can use justWombat.crawl
and start working. The class based format still works as before though.
- Added the ability to provide a block to Crawler#crawl and override the default crawler properties for a one off run (thanks to @danielnc)