Skip to content

web crawler designed to extract and process real estate informationز The script utilizes the Playwright library for web scraping, Aiosqlite for SQLite database interactions, and Rich for console logging.

Notifications You must be signed in to change notification settings

bambeero1/REMiner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

REMiner

This Python script is a web crawler designed to extract and process real estate information. The script utilizes the Playwright library for web scraping, Aiosqlite for SQLite database interactions, and Rich for console logging.

Saudi.public.real.estate.listings.data.collection.tool.1.mp4

Dependencies

Make sure to install the required dependencies using the following command:

pip install playwright aiosqlite playwright_stealth argparse rich

Usage

Command-line Arguments

  • --sqlite: Flag to save data to an SQLite database.
  • --json: Flag to save data to a JSON file.
  • --st: Starting page number for crawling.

Example Usage

To start crawling from page 1 and save data to SQLite, run the following command:

python aqar.py --sqlite

To start crawling from a specific page (e.g., page 5) and save data to a JSON file, run:

python aqar.py --json --st 5

Logging

The script uses the Rich library for logging, providing a visually appealing and informative console output.

Database Schema

The SQLite database schema includes the following fields in the "Aqarat" table:

  • adid: Ad ID (Primary Key)
  • title: Ad title
  • description: Ad description
  • author_name: Author's name
  • price: Ad price
  • filters: Filters applied to the property
  • generic_values: Values corresponding to the filters
  • cat: Property category
  • author_url: URL of the author's profile
  • city: City of the property
  • citydir: City district or part
  • dist: District of the property
  • imgs: URLs of images associated with the ad
  • map_url: URL of the property on the map

Rights

Mohammed Alraddadi

About

web crawler designed to extract and process real estate informationز The script utilizes the Playwright library for web scraping, Aiosqlite for SQLite database interactions, and Rich for console logging.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages