App Store Scraper Bot

A Node.js-based scraping tool and API that analyzes company websites to detect and extract information about their mobile applications from both the Google Play Store and Apple App Store.

Features

Website analysis for app store presence
Google Play Store metadata extraction
Apple App Store metadata extraction
Fallback company information gathering
Rate limiting and proxy support
Concurrent URL processing
User agent rotation
RESTful API
CSV export

Prerequisites

Node.js (v14 or higher)
npm (Node Package Manager)

Installation

Clone the repository:

git clone <repository-url>
cd app-store-scraper-bot

Install dependencies:

npm install

Create a .env file based on .env.example

Usage

API Server

Start the API server:

npm start

For development with auto-reload:

npm run dev

API Endpoints

Health Check

GET /health

Scrape URLs

POST /scrape
Content-Type: application/json

{
    "urls": [
        "https://example1.com",
        "https://example2.com"
    ]
}

Response format:

{
    "success": true,
    "results": [
        {
            "company": "Example Inc.",
            "url": "https://example.com",
            "app_present": true,
            "google_play_data": {
                "link": "https://play.google.com/store/apps/details?id=com.example",
                "downloads": "1M+",
                "last_updated": "2024-01-01",
                "developer_email": "[email protected]"
            },
            "app_store_data": {
                "link": "https://apps.apple.com/app/example-app/id123456789",
                "last_updated": "2024-01-01",
                "developer_email": "[email protected]"
            },
            "fallback_data": null
        }
    ]
}

Command Line Interface

You can still use the CLI version:

npm run scrape-cli

Rate Limiting

The API includes built-in rate limiting:

100 requests per IP per 15 minutes
Maximum 10 URLs per request
2-second delay between batches
3 concurrent URL processing

Error Handling

Invalid URLs are rejected with 400 status
Rate limit exceeded returns 429 status
Server errors return 500 status
All errors include descriptive messages

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
prd.md		prd.md
results.csv		results.csv
run.js		run.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

App Store Scraper Bot

Features

Prerequisites

Installation

Usage

API Server

API Endpoints

Health Check

Scrape URLs

Command Line Interface

Rate Limiting

Error Handling

License

About

Releases

Packages

Languages

divyanshgandhilm/lm-scrapebot

Folders and files

Latest commit

History

Repository files navigation

App Store Scraper Bot

Features

Prerequisites

Installation

Usage

API Server

API Endpoints

Health Check

Scrape URLs

Command Line Interface

Rate Limiting

Error Handling

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages