Skip to content

derfenix/webarchive

Folders and files

NameName
Last commit message
Last commit date
Nov 1, 2023
Nov 24, 2023
Nov 24, 2023
Nov 1, 2023
Nov 24, 2023
Mar 28, 2023
Apr 4, 2023
Nov 24, 2023
Nov 24, 2023
Apr 4, 2023
Mar 28, 2023
Nov 24, 2023
Mar 28, 2023
Apr 3, 2023
Apr 13, 2023
Apr 3, 2023
Jul 30, 2024
Jul 30, 2024

Repository files navigation

Own Webarchive

Aimed to be a simple, fast and easy-to-use webarchive for personal or home-net usage.

Supported store formats

  • headers — save all headers from response
  • pdf — save page in pdf
  • single_file — save html and all its resources (css,js,images) into one html file

Requirements

  • Golang 1.19 or higher
  • wkhtmltopdf binary in $PATH (to save pages in pdf)

Configuration

The service can be configured via environment variables. There is a list of available variables:

  • DB
    • DB_PATH — path for the database files (default ./db)
  • LOGGING
    • LOGGING_DEBUG — enable debug logs (default false)
  • API
    • API_ADDRESS — address the API server will listen (default 0.0.0.0:5001)
  • UI
    • UI_ENABLED — Enable builtin web UI (default true)
    • UI_PREFIX — Prefix for the web UI (default /)
    • UI_THEME — UI theme name (default basic). No other values available yet
  • PDF
    • PDF_LANDSCAPE — use landscape page orientation instead of portrait (default false)
    • PDF_GRAYSCALE — use grayscale filter for the output pdf (default false)
    • PDF_MEDIA_PRINT — use media type print for the request (default true)
    • PDF_ZOOM — zoom page (default 1.0 i.e. no actual zoom)
    • PDF_VIEWPORT — use specified viewport value (default 1280x720)
    • PDF_DPI — use specified DPI value for the output pdf (default 150)
    • PDF_FILENAME — use specified name for output pdf file (default page.pdf)

Note: Prefix WEBARCHIVE_ can be used with the environment variable names in case of any conflicts.

Usage

1. Start the server

Start without docker

go run ./cmd/server/main.go

Change API address

API_ADDRESS=127.0.0.1:3001 go run ./cmd/server/main.go

Start in docker

docker compose up -d webarchive

2. Add a page

curl -X POST --location "http://localhost:5001/api/v1/pages" \
    -H "Content-Type: application/json" \
    -d "{
          \"url\": \"https://github.com/wkhtmltopdf/wkhtmltopdf/issues/1937\",
          \"formats\": [
            \"pdf\",
            \"headers\"
          ]
        }" | jq .

or

curl -X POST --location \
  "http://localhost:5001/api/v1/pages?url=https%3A%2F%2Fgithub.com%2Fwkhtmltopdf%2Fwkhtmltopdf%2Fissues%2F1937&formats=pdf%2Cheaders&description=Foo+Bar"

3. Get the page's info

curl -X GET --location "http://localhost:5001/api/v1/pages/$page_id" | jq .

where $page_id — value of the id field from previous command response. If status field in response is success (or with_errors) - the results field will contain all processed formats with ids of the stored files.

4. Open file in browser

xdg-open "http://localhost:5001/api/v1/pages/$page_id/file/$file_id"

Where $page_id — value of the id field from previous command response, and $file_id — the id of interesting file.

5. List all stored pages

curl -X GET --location "http://localhost:5001/api/v1/pages" | jq .

Roadmap

  • Save page to pdf
  • Save URL headers
  • Save page to the single-page html
  • Save page to html with separate resource files (?)
  • Basic web UI
  • Optional authentication
  • Multi-user access
  • Support SQL database with or without separate files storage
  • Tags/Categories
  • Save page to markdown