RobotScraper

RobotScraper is a simple Go application that fetches and processes the robots.txt file from specified domains. It extracts allowed and disallowed paths and saves them as full URLs to a specified text file.

Features

Fetches robots.txt files over HTTP and HTTPS.
Extracts allowed and disallowed paths.
Saves results as full URLs in a text file.
Supports multiple domains in a single run.
Provides an animated saving message for a better user experience.

Installation

Clone the repository:

git clone https://github.com/whoamikiddie/robot-scraper
cd RobotScraper

Build the application:
```
go build robot.go
```

Usage

To run the program, use the following command:

go run robot.go -d <domain1,domain2,...> [-s <filename>]

Options

-d, --domain: Specify one or more domains to scrape, separated by commas.
-s, --save: Specify a filename to save the output in text format. (default: output.txt)

Example

To fetch robots.txt from example.com and example.org and save the output to output.txt, run:

go run robot.go -d example.com,example.org -s output.txt

Output

The output file will contain:

Allowed URLs: Paths that are allowed for web crawlers.
Disallowed URLs: Paths that are not allowed for web crawlers.

Each entry will be saved in the format:

Allowed URLs:
https://example.com/path1
https://example.com/path2

Disallowed URLs:
https://example.com/path3
https://example.com/path4

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
go.mod		go.mod
readme.md		readme.md
robot.go		robot.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RobotScraper

Features

Installation

Usage

Options

Example

Output

About

Releases

Packages

Languages

whoamikiddie/robot-scraper

Folders and files

Latest commit

History

Repository files navigation

RobotScraper

Features

Installation

Usage

Options

Example

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages