GitHub - UKOMAL/chunkr: Vision model based PDF chunking

Chunkr

Chunkr is a self-hostable API for converting pdf, pptx, docx, and excel files into RAG/LLM ready data
11 semantic tags for layout analysis | OCR + Bounding Boxes | Structured HTML and markdown

Try it out! · Report Bug · Contact

Watch our 1-minute demo video

Docs

https://docs.chunkr.ai

(Super) Quick Start

Go to chunkr.ai
Make an account and copy your API key

Create a task:

curl -X POST https://api.chunkr.ai/api/v1/task \
   -H "Content-Type: multipart/form-data" \
   -H "Authorization: ${YOUR_API_KEY}" \
   -F "file=@/path/to/your/file" \
   -F "model=HighQuality" \
   -F "target_chunk_length=512" \
   -F "ocr_strategy=Auto"

Poll your created task:

curl -X GET https://api.chunkr.ai/api/v1/task/${TASK_ID} \
  -H "Authorization: ${YOUR_API_KEY}"

Self-Hosted Deployment Options

Quick Start with Docker Compose

Prerequisites:
- Docker and Docker Compose
- NVIDIA Container Toolkit (for GPU support)

Deploy:

git clone https://github.com/lumina-ai-inc/chunkr
cd chunkr
cp .env.example .env
docker compose up -d

Access the services:
- Web UI: http://localhost:5173
- API: http://localhost:8000

Note: Requires an NVIDIA CUDA GPU to run.

Production Deployment with Kubernetes

For production environments, we provide Kubernetes manifests and deployment instructions:

See our detailed guide at self-deployment.md
Includes configurations for high availability and scaling

For enterprise support and deployment assistance, contact us.

Licensing

This project is dual-licensed:

GNU Affero General Public License v3.0 (AGPL-3.0)
Commercial License

To use Chunkr without complying with the AGPL-3.0 license terms you can contact us or visit our website.

Connect With Us

📧 Email: [email protected]
📅 Schedule a call: Book a 30-minute meeting
🌐 Visit our website: chunkr.ai

Name		Name	Last commit message	Last commit date
Latest commit History 2,238 Commits
.github/workflows		.github/workflows
.vscode		.vscode
apps/web		apps/web
chunkmydocs		chunkmydocs
docker		docker
images		images
kube		kube
packages		packages
pyscripts		pyscripts
services		services
terraform		terraform
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
COMMERCIAL_LICENSE.md		COMMERCIAL_LICENSE.md
LICENSE		LICENSE
README.md		README.md
THIRD-PARTY-NOTICES.md		THIRD-PARTY-NOTICES.md
build_dockers.sh		build_dockers.sh
compose.yaml		compose.yaml
git.sh		git.sh
meta.json		meta.json
output.json		output.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
postcss.config.cjs		postcss.config.cjs
pr-branch.sh		pr-branch.sh
realm-export.json		realm-export.json
self-deployment.md		self-deployment.md
tailwind.config.cjs		tailwind.config.cjs
test.tsx		test.tsx
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chunkr

Table of Contents

Docs

(Super) Quick Start

Self-Hosted Deployment Options

Quick Start with Docker Compose

Production Deployment with Kubernetes

Licensing

Connect With Us

About

Releases

Packages

Languages

License

UKOMAL/chunkr

Folders and files

Latest commit

History

Repository files navigation

Chunkr

Table of Contents

Docs

(Super) Quick Start

Self-Hosted Deployment Options

Quick Start with Docker Compose

Production Deployment with Kubernetes

Licensing

Connect With Us

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages