Releases: brianmadden/krawler
Releases · brianmadden/krawler
Upgrade Kotlin and Coroutines Package
- Upgrade Kotlin to 1.3.61
- Upgrade
kotlinx.coroutines
. This required an update to some of the places where coroutine builders were called internally. - Upgrade Gradle wrapper
Queue Priority; Clear Queues; Always On Support
- Added ability to clear crawl queues by RequestId and Age, see
Krawler#removeUrlsByRootPage
andKrawler#removeUrlsByAge
- Added config option to prevent crawler shutdown on empty queues
- Added new single byte priority field to
KrawlQueueEntry
. Queues will always attempt to pop thelowest
priority
entry available. Priority can be assigned by overriding theKrawler#assignQueuePriorty
method. - Update dependencies
Remove Logger Implementation
0.4.1 (2017-8-15)
- Removed logging implementation from dependencies to prevent logging conflicts when used as a library.
- Updated Kotlin version to 1.1.4
- Updated
kotlinx.coroutines
to .17
Coroutines!
0.4.0 (2017-5-17)
-
Rewrote core crawl loop to use Kotlin 1.1 coroutines. This has effectively turned the crawl process into a multi-stage pipeline. This architecture change has removed the necessity for some locking by removing resource contention by multiple threads.
-
Updated the build file to build the simple example as a runnable jar
-
Minor bug fies in the KrawlUrl class.
Kotlin 1.1, Logging, and Bug Fixes
-
Fixed a number of bugs that would result in a crashed thread, and subsequently an incorrect number of crawled pages
as well as cause slowdowns due to a reduced number of worker threads. -
Added a new utility function to wrap
doCrawl
and log any uncaught exceptions during crawling.
0.3.1 - Multi-queue and bug fixes
- Created 1:1 mapping between threads and the number of queues used to serve URLs to visit. URLs have an
affinity for a particular queue based on their domain. All URLs from that domain will end up in the same
queue. This improves parallel crawl performance by reducing the frequency that the politeness delay
effects requests. For crawls bound to fewer domains than queues, the excess queues are not used. - Many bug fixes including fix that eliminates accidental over-crawling.