Name	Name	Last commit message	Last commit date
parent directory ..
cmd/xcrawl3r	cmd/xcrawl3r
internal/configuration	internal/configuration
pkg/xcrawl3r	pkg/xcrawl3r
CONTRIBUTING.md	CONTRIBUTING.md
LICENSE	LICENSE
README.md	README.md
go.mod	go.mod
go.sum	go.sum

xcrawl3r

xcrawl3r is a command-line interface (CLI) utility to recursively crawl webpages i.e systematically browse webpages' URLs and follow links to discover linked webpages' URLs.

Resources

Features
Installation
- Install release binaries
- Install source
  - go install ...
  - go build ... the development Version
Usage
Contribution
Licensing

Features

Recursively crawls webpages for URLs.
Parses files for URLs. (.js, .json, .xml, .csv, .txt & .map)
Parses robots.txt for URLs.
Parses sitemaps for URLs.
Customizable Parallelism

Installation

Install release binaries

Visit the releases page and find the appropriate archive for your operating system and architecture. Download the archive from your browser or copy its URL and retrieve it with wget or curl:

...with wget:

 wget https://github.com/hueristiq/xcrawl3r/releases/download/v<version>/xcrawl3r-<version>-linux-amd64.tar.gz

...or, with curl:

 curl -OL https://github.com/hueristiq/xcrawl3r/releases/download/v<version>/xcrawl3r-<version>-linux-amd64.tar.gz

...then, extract the binary:

tar xf xcrawl3r-<version>-linux-amd64.tar.gz

TIP: The above steps, download and extract, can be combined into a single step with this onliner
curl -sL https://github.com/hueristiq/xcrawl3r/releases/download/v<version>/xcrawl3r-<version>-linux-amd64.tar.gz | tar -xzv

NOTE: On Windows systems, you should be able to double-click the zip archive to extract the xcrawl3r executable.

...move the xcrawl3r binary to somewhere in your PATH. For example, on GNU/Linux and OS X systems:

sudo mv xcrawl3r /usr/local/bin/

NOTE: Windows users can follow How to: Add Tool Locations to the PATH Environment Variable in order to add xcrawl3r to their PATH.

Install source

Before you install from source, you need to make sure that Go is installed on your system. You can install Go by following the official instructions for your operating system. For this, we will assume that Go is already installed.

`go install ...`

go install -v github.com/hueristiq/xcrawl3r/cmd/xcrawl3r@latest

`go build ...` the development Version

Clone the repository

 git clone https://github.com/hueristiq/xcrawl3r.git

Build the utility

 cd xcrawl3r/cmd/xcrawl3r && \
 go build .

Move the xcrawl3r binary to somewhere in your PATH. For example, on GNU/Linux and OS X systems:
```
 sudo mv xcrawl3r /usr/local/bin/
```
NOTE: Windows users can follow How to: Add Tool Locations to the PATH Environment Variable in order to add xcrawl3r to their PATH.

NOTE: While the development version is a good way to take a peek at xcrawl3r's latest features before they get released, be aware that it may have bugs. Officially released versions will generally be more stable.

Usage

To display help message for xcrawl3r use the -h flag:

xcrawl3r -h

help message:

                             _ _____      
__  _____ _ __ __ ___      _| |___ / _ __ 
\ \/ / __| '__/ _` \ \ /\ / / | |_ \| '__|
 >  < (__| | | (_| |\ V  V /| |___) | |   
/_/\_\___|_|  \__,_| \_/\_/ |_|____/|_| v0.0.0

A CLI utility to recursively crawl webpages.

USAGE:
  xcrawl3r [OPTIONS]

INPUT:
  -d, --domain string              domain to match URLs
      --include-subdomains bool    match subdomains' URLs
  -s, --seeds string               seed URLs file (use `-` to get from stdin)
  -u, --url string                 URL to crawl

CONFIGURATION:
      --depth int                  maximum depth to crawl (default 3)
                                       TIP: set it to `0` for infinite recursion
      --timeout int               time to wait for request in seconds (default: 10)
  -H, --headers string[]          custom header to include in requests
                                       e.g. -H 'Referer: http://example.com/'
                                       TIP: use multiple flag to set multiple headers
      --user-agent string         User Agent to use (default: web)
                                       TIP: use `web` for a random web user-agent,
                                       `mobile` for a random mobile user-agent,
                                        or you can set your specific user-agent.
      --proxy string[]            Proxy URL (e.g: http://127.0.0.1:8080)
                                       TIP: use multiple flag to set multiple proxies

RATE LIMIT:
  -c, --concurrency int           number of concurrent fetchers to use (default 10)
  -p, --parallelism int           number of concurrent URLs to process (default: 10)
      --delay int                 delay between each request in seconds
      --max-random-delay int      maximux extra randomized delay added to `--dalay` (default: 1s)

OUTPUT:
      --debug bool                 enable debug mode (default: false)
  -m, --monochrome bool            coloring: no colored output mode
  -o, --output string              output file to write found URLs
  -v, --verbosity string           debug, info, warning, error, fatal or silent (default: debug)

Contributing

Issues and Pull Requests are welcome! Check out the contribution guidelines.

Licensing

This utility is distributed under the MIT license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xcrawl3r

xcrawl3r

README.md

xcrawl3r

Resources

Features

Installation

Install release binaries

Install source

`go install ...`

`go build ...` the development Version

Usage

Contributing

Licensing

Files

xcrawl3r

Directory actions

More options

Directory actions

More options

Latest commit

History

xcrawl3r

Folders and files

parent directory

README.md

xcrawl3r

Resources

Features

Installation

Install release binaries

Install source

go install ...

go build ... the development Version

Usage

Contributing

Licensing

`go install ...`

`go build ...` the development Version