Skip to content

UKOMAL/chunkr

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Logo

Chunkr

Chunkr is a self-hostable API for converting pdf, pptx, docx, and excel files into RAG/LLM ready data
11 semantic tags for layout analysis | OCR + Bounding Boxes | Structured HTML and markdown

Try it out! · Report Bug · Contact

Demo video

Watch our 1-minute demo video

Table of Contents

Docs

https://docs.chunkr.ai

(Super) Quick Start

  1. Go to chunkr.ai
  2. Make an account and copy your API key
  3. Create a task:
    curl -X POST https://api.chunkr.ai/api/v1/task \
       -H "Content-Type: multipart/form-data" \
       -H "Authorization: ${YOUR_API_KEY}" \
       -F "file=@/path/to/your/file" \
       -F "model=HighQuality" \
       -F "target_chunk_length=512" \
       -F "ocr_strategy=Auto"
  4. Poll your created task:
    curl -X GET https://api.chunkr.ai/api/v1/task/${TASK_ID} \
      -H "Authorization: ${YOUR_API_KEY}"

Self-Hosted Deployment Options

Quick Start with Docker Compose

  1. Prerequisites:

  2. Deploy:

    git clone https://github.com/lumina-ai-inc/chunkr
    cd chunkr
    cp .env.example .env
    docker compose up -d
  3. Access the services:

    • Web UI: http://localhost:5173
    • API: http://localhost:8000

Note: Requires an NVIDIA CUDA GPU to run.

Production Deployment with Kubernetes

For production environments, we provide Kubernetes manifests and deployment instructions:

  1. See our detailed guide at self-deployment.md
  2. Includes configurations for high availability and scaling

For enterprise support and deployment assistance, contact us.

Licensing

This project is dual-licensed:

  1. GNU Affero General Public License v3.0 (AGPL-3.0)
  2. Commercial License

To use Chunkr without complying with the AGPL-3.0 license terms you can contact us or visit our website.

Connect With Us

About

Vision model based PDF chunking

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 39.8%
  • Rust 30.9%
  • TypeScript 18.8%
  • CSS 3.3%
  • HCL 2.2%
  • PLpgSQL 1.7%
  • Other 3.3%