Online library parser

This project contains scripts for parsing tululu.org online library and create an offline library with parsed data.

Offline use

This repo already contains all needed data to run offline library with science fiction books. If you just want to use it, download repo and open index.html file in repo root folder with your web-browser. Enjoy)

Using book parser

This script is intended for downloading books from tululu.org online library alongside with their covers, genre information and user`s comments.

Installation

Python 3 should be already installed. Install dependencies by running

pip install -r requirements.txt

Running

Run the main.py script by typing

python3 main.py --start_id "id" --end_id "id"

or

python3 main.py -s "id" -e "id"

where --start_id or -s for short - starting page, and --end_id or -e for short - ending page.

For example, if you type

python3 main.py -s 1 -e 20

script will download books from page 1 to page 20. If no arguments inputed at all, default indexes will be used, which from 1 to 10 page.

Script will make directories for downloaded content inside current working directory - books for books .txt files, images for books covers and comments for book user comments, saved in .txt file.

Using sci-fi genre parser

parse_tululu_category.py - script intended for download books of science fiction genre.

To run it, type

python3 parse_tululu_category.py

Optional arguments:

--start_page - Download starts from that page
--end_page - Download ends with that page
--dest_folder - Root folder for downloaded data
--skip_txt - Boolean, enables/disables books .txt download
--skip_images - Boolean, enables/disables books covers download
--json_path - Folder for saving books JSON dump

For example, if you want to download books from pages 15 through 25, without images, type

python3 parse_tululu_category.py --start_page 15 --end_page 25 --skip_images

Generate offline website

render_website - script for render offline library website from data, gathered by parse_tululu_category.py script.

For example, after using parse_tululu_category script, type

python3 render_website.py

You can use your own json dump file by running script with --json_path parameter:

python3 render_website.py --json_path "path_to_your_dump"

Use included json/books_dump.json file as a reference for making your own dump. If no path is specified, default json/books_dump.json dump will be used.

Script will download all needed data, and render site pages for offline use. You can now open index.html file in repository root directory with your web-browser and start using the library. This script runs local server as well, you can go to 127.0.0.1:5500 and start using generated site.

You can find example of site at GitHub Pages

Project goals

The code is written for educational purposes on online-course for web-developers dvmn.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Online library parser

Offline use

Using book parser

Installation

Running

Using sci-fi genre parser

Generate offline website

Project goals

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
books		books
images		images
json		json
pages		pages
static		static
.gitignore		.gitignore
README.MD		README.MD
index.html		index.html
main.py		main.py
parse_tululu_category.py		parse_tululu_category.py
render_website.py		render_website.py
requirements.txt		requirements.txt
template.html		template.html

MrSabin/parsing_online_library

Folders and files

Latest commit

History

Repository files navigation

Online library parser

Offline use

Using book parser

Installation

Running

Using sci-fi genre parser

Generate offline website

Project goals

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages