Simple crawler

Crawls a website (provided base url])

Features

Generic
Custom depth
Handled DOS by limited calls (check below how it's done)

Example:

 import fetch from 'node-fetch';
import { ThrottledAsyncCalls } from './async_throttle';
import { Crawler } from './crawler';

const mediumHostName = "medium.com"
const mediumCrawler = new Crawler(
     ThrottledAsyncCalls.wrap({
         concurrency:  5,
         func: fetch
     }).func,
     {
         baseUrl: `https://${mediumHostName}`,
         hostName: mediumHostName,
         startUrl: `https://${mediumHostName}`,
         depth: 3,
         verbose: true
     }
 )
mediumCrawler.start().then(async (e) => {
    // Now process data
    // play with it
})

The repo contains also a simple wrapper to limit call concurreny to specified number.
It doesn't matter which function you are wrapping

Limits concurreny
No ugly wrappers, neat code
Treat it like your async function and do .then, everything is handled.

Example

import { ThrottledAsyncCalls } from '../src/async_throttle';

async function  test(x: number) {
  return x + 1
}

// Simple, yet powerful
const {func: func, object: boundObject} = ThrottledAsyncCalls.wrap({
    concurrency:  4,  // Max concurrent calls 
    func: test // function to wrap
})


// Call it like below

 function start(index) {
    return Promise.all([
     func(0).then(e => console.log(index)),
     func(0),
     func(0),
     func(0),
     func(0),
     func(0),
     func(0),
     func(0),
     func(0).then(e => console.log(index))
    ])
}

start(1).then(async e => {
    console.log("After all tasks done executing", e)
})

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.vscode		.vscode
src		src
test		test
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tslint.json		tslint.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple crawler

License

About

Releases

Packages

Languages

junaid1460/crawler

Folders and files

Latest commit

History

Repository files navigation

Simple crawler

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages