LinkedIn Scraper

This project is a LinkedIn scraper built using Python, Selenium, and Redis. The scraper logs into LinkedIn and scrapes profile posts, comments, and metadata such as likes and comments count. Extracted profile URLs are stored in a Redis queue for further processing.

Note: This project is still a work in progress, with certain features yet to be fully implemented, such as scraping 500 profiles and complete Redis functionality.

Architecture

Output

Features

Logs into LinkedIn using provided credentials.
Infinite scroll functionality to scrape all posts from a profile.
Scrapes post text, post date, number of likes, comments, and more.
Extracts profile URLs from the comments section of each post.
Stores profile URLs in Redis for further processing.
Saves scraped data in JSON files.

Limitations and Future Work

LinkedIn Account Restrictions: Due to limitations and restrictions from LinkedIn, this scraper has not been tested on scraping 500 profiles to avoid triggering LinkedIn's security mechanisms and getting my account blocked.
Google Captcha / Network Blocking: The scraper may encounter Google captchas, which could block your IP from continuing. This is a potential roadblock when scaling the number of profiles scraped.
IP Rotation: To avoid IP blocks and scraping limits, a rotating IP or proxy setup would be needed to scale this project effectively. I have not yet implemented this solution but plan to include it in the future.
Redis Queue Functionality: The part of the project involving Redis for queue management is partially implemented. Although URLs are being stored in Redis, the full functionality for profile URL queue processing, tracking, and logging is incomplete and needs further work.

Installation

Prerequisites

Python 3.9+
Docker
Redis

Steps to Run

Clone the repository:

git clone https://github.com/Ansumanbhujabal/Linkedin_Scraper.git

Build the Docker image:
```
docker build -t linkedin-scraper .
```
Run the Docker container:
```
docker run -d linkedin-scraper
```
This will launch the scraper inside a Docker container.
To stop the container:
```
docker stop <container_id>
```

Requirements

All Python dependencies are listed in requirements.txt and are installed automatically during the Docker build.

Selenium
WebDriver Manager
Redis
Other dependencies listed in requirements.txt

Redis

To start the Redis server locally:

redis-server

Future Improvements

Rotating IP Support: Implement IP rotation using proxy services to avoid network blocks from LinkedIn.
Complete Redis Integration: Fully implement Redis for managing profile queues and retry mechanisms for failed attempts.
Handling LinkedIn Limits: Implement better handling of LinkedIn's rate limits and account restrictions.

Disclaimer

This project is for educational purposes only. Be aware of LinkedIn's terms and conditions regarding web scraping and automated actions. Always ensure that your use of scraping tools complies with applicable terms of service.

License

Usage Restricted to Author

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
__pycache__		__pycache__
data_analysis		data_analysis
logs		logs
output		output
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
dump.rdb		dump.rdb
mongo_sample_document.json		mongo_sample_document.json
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LinkedIn Scraper

Architecture

Output

Features

Limitations and Future Work

Installation

Prerequisites

Steps to Run

Requirements

Redis

Future Improvements

Disclaimer

License

About

Releases

Packages

Contributors 2

Languages

Ansumanbhujabal/Linkedin_Scraper

Folders and files

Latest commit

History

Repository files navigation

LinkedIn Scraper

Architecture

Output

Features

Limitations and Future Work

Installation

Prerequisites

Steps to Run

Requirements

Redis

Future Improvements

Disclaimer

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages