Skip to content

whoamikiddie/robot-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

RobotScraper

RobotScraper is a simple Go application that fetches and processes the robots.txt file from specified domains. It extracts allowed and disallowed paths and saves them as full URLs to a specified text file.

Features

  • Fetches robots.txt files over HTTP and HTTPS.
  • Extracts allowed and disallowed paths.
  • Saves results as full URLs in a text file.
  • Supports multiple domains in a single run.
  • Provides an animated saving message for a better user experience.

Installation

  1. Clone the repository:

    git clone https://github.com/whoamikiddie/robot-scraper
    cd RobotScraper
  2. Build the application:

    go build robot.go

Usage

To run the program, use the following command:

go run robot.go -d <domain1,domain2,...> [-s <filename>]

Options

  • -d, --domain: Specify one or more domains to scrape, separated by commas.
  • -s, --save: Specify a filename to save the output in text format. (default: output.txt)

Example

To fetch robots.txt from example.com and example.org and save the output to output.txt, run:

go run robot.go -d example.com,example.org -s output.txt

Output

The output file will contain:

  • Allowed URLs: Paths that are allowed for web crawlers.
  • Disallowed URLs: Paths that are not allowed for web crawlers.

Each entry will be saved in the format:

Allowed URLs:
https://example.com/path1
https://example.com/path2

Disallowed URLs:
https://example.com/path3
https://example.com/path4


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages