A Node.js-based scraping tool and API that analyzes company websites to detect and extract information about their mobile applications from both the Google Play Store and Apple App Store.
- Website analysis for app store presence
- Google Play Store metadata extraction
- Apple App Store metadata extraction
- Fallback company information gathering
- Rate limiting and proxy support
- Concurrent URL processing
- User agent rotation
- RESTful API
- CSV export
- Node.js (v14 or higher)
- npm (Node Package Manager)
- Clone the repository:
git clone <repository-url>
cd app-store-scraper-bot
- Install dependencies:
npm install
- Create a
.env
file based on.env.example
Start the API server:
npm start
For development with auto-reload:
npm run dev
GET /health
POST /scrape
Content-Type: application/json
{
"urls": [
"https://example1.com",
"https://example2.com"
]
}
Response format:
{
"success": true,
"results": [
{
"company": "Example Inc.",
"url": "https://example.com",
"app_present": true,
"google_play_data": {
"link": "https://play.google.com/store/apps/details?id=com.example",
"downloads": "1M+",
"last_updated": "2024-01-01",
"developer_email": "[email protected]"
},
"app_store_data": {
"link": "https://apps.apple.com/app/example-app/id123456789",
"last_updated": "2024-01-01",
"developer_email": "[email protected]"
},
"fallback_data": null
}
]
}
You can still use the CLI version:
npm run scrape-cli
The API includes built-in rate limiting:
- 100 requests per IP per 15 minutes
- Maximum 10 URLs per request
- 2-second delay between batches
- 3 concurrent URL processing
- Invalid URLs are rejected with 400 status
- Rate limit exceeded returns 429 status
- Server errors return 500 status
- All errors include descriptive messages
MIT