Build software better, together

dissorial / doc-chatbot

Document chatbot — multiple files, topics, chat windows and chat history. Powered by GPT.

chat typescript reactjs mongoose nextjs chatbot openai vectorization pinecone document-embedding tailwindcss pdf-processing gpt-3 openai-api gpt-4 langchain

Updated Jul 21, 2023
TypeScript

allenai / papermage

Star

library supporting NLP and CV research on scientific papers

python machine-learning natural-language-processing computer-vision scientific-papers multimodal pdf-processing

Updated Nov 8, 2024
Python

ahmedkhemiri95 / PDFs-TextExtract

Star

Multiple and Large PDF Documents Text Extraction.

python pdf parser data-science pdf-document text-analytics pdfs pypdf2 extract-text pdfminer pdf-processing pdfs-textextract

Updated Feb 10, 2025
Python

aws-samples / document-processing-pipeline-for-regulated-industries

Star

A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.

Updated Oct 25, 2021
Python

Govind-S-B / pdf-to-text-chroma-search

Star

Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.

text-extraction similarity-search pdf-processing vector-embeddings chromadb

Updated Oct 23, 2023
Python

ManasMadan / pdf-actions

Star

A NPM Package built on top of pdf-lib that provides functonalities like merge, rotate, split,download pdf to disk and many more...

react javascript pdf npm reactjs react-component pdf-merge pdf-split pdf-rotate pdf-merger pdf-downloader pdf-lib pdf-splitter pdf-processing pdf-download pdf-free pdf-online

Updated Oct 31, 2023
JavaScript

ManasMadan / PDFActions

Star

Built with pdf-actions NPM package.

react pdf reactjs react-component react-components pdf-merge pdf-split pdf-rotate pdf-merger pdf-downloader pdf-lib pdf-splitter pdf-processing pdf-download

Updated May 27, 2024
JavaScript

ranguy9304 / LangGraphRAG

Star

LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.

python natural-language-processing information-retrieval chatbot web-scraping nlp-machine-learning rag terminal-application pdf-processing vector-database openai-api langgraph

Updated Jul 13, 2024
Python

enesmanan / paper-bold

Star

AI-powered RAG-based tool for summarizing, extracting insights, and answering questions about research papers with high accuracy

academic-paper gemini-api rag pdf-processing academic-research langchain

Updated Mar 20, 2025
HTML

Inc44 / MaTools

Star

An all-in-one GUI management toolkit built with PyQt6, offering a suite of tools for file synchronization, media organization, PDF merging, code formatting, and more.

python rust productivity application gui qt ocr image-processing video-processing speech-recognition youtube-downloader file-management audio-processing pdf-processing code-formatting

Updated Mar 15, 2025
Python

allanninal / document-summarizer

Star

The Document Summarizer leverages Hugging Face’s facebook/bart-large-cnn model to transform lengthy documents into concise summaries. Built with ReactJS (Vite) for the frontend and Flask for the backend, it supports PDF and text files, offering real-time summarization for researchers, students, and professionals.

nlp flask reactjs text-summarization vite huggingface pdf-processing document-summarizer ai-tools open-source-cods

Updated Dec 7, 2024
JavaScript

Yardenrsk / PsychometryReceiverCV

Star

A side project to easily get and annotate questions and answers to the PsychometryBot project DB using computer vision and pdf parsing

pandas opencv-python pdf-processing

Updated Sep 18, 2022
Python

Aleptonic / PdfSnipper

Star

PdfSnipper is a lightweight and efficient Python package designed to simplify the management of PDF files, pages, and their conversions during various NLP, Computer Vision (CV), or other data processing tasks. The package eliminates the need for repetitive code by providing intuitive, ready-to-use functions for common PDF-related operations.

utilities pdf-processing nlp-tools

Updated Feb 3, 2025
Python

thinhuos0913 / python_useful_mini_projects

Star

This is some useful mini projects that I had worked for self-learning Python programming.

python opencv ocr image-processing pdf-processing

Updated May 20, 2024
Python

Al-shwaib / Book-Preparation-for-Printing

Star

A web application for preparing books and magazines for offset printing. Automatically arranges PDF pages for commercial A3 printing, supporting both Arabic (RTL) and English (LTR) books. تطبيق ويب لتحضير الكتب والمجلات للطباعة على مطابع الأوفست. يقوم تلقائياً بترتيب صفحات PDF للطباعة التجارية على ورق A3، مع دعم الكتب العربية والإنجليزية.

flask-application pymupdf pdf-processing rtl-support offset-printing book-preparation arabic-books commercial-printing a3-printing order-to-print

Updated Jan 6, 2025
Python

arsath-eng / RAG1-NVIDIA-GENAI

Star

A powerful Retrieval Augmented Generation (RAG) application built with NVIDIA AI endpoints and Streamlit. This solution enables intelligent document analysis and question-answering using state-of-the-art language models, featuring multi-PDF processing, FAISS vector store integration, and advanced prompt engineering.

embeddings question-answering document-analysis faiss rag pdf-processing streamlit llm langchain vector-store nvidia-ai-faundry llama-models

Updated Oct 31, 2024
Python

Farhaj499 / RAG_with_Weaviate_DB

Star

This project implements a Retrieval Augmented Generation (RAG) system that answers questions based on the PDF document. It utilizes Weaviate as a vector database for efficient retrieval of relevant information and Gemini to generate natural language responses.

python embeddings semantic-search rag weaviate pdf-processing vector-database huggingface-transformers langchain retrieval-augmented-generation agentic-ai

Updated Jan 12, 2025
Jupyter Notebook

ydvrahul19 / Invoice-Manager

Star

A modern, intelligent invoice processing system with advanced multi-format data extraction capabilities. Process invoices from PDFs, Excel files, and images with smart data recognition.

react firebase material-ui data-extraction invoice-management pdf-processing framer-motion redux-toolkit invoice-processing

Updated Nov 23, 2024
JavaScript

dsckiet / covid-tracker-android-app

Star

A statistical data display and notifier app for Covid-19 pandemic.

statistics mvvm dagger2 pdf-processing

Updated May 15, 2022
Kotlin

rithulkamesh / docproc

Sponsor

Star

Opinionated and Sophisticated Document Region Analyzer.

python machine-learning ocr text-classification text-extraction data-extraction region-detection content-extraction document-analysis layout-analysis pdf-processing pdf-text-extraction document-parsing equation-detection mathematical-symbols

Updated Mar 4, 2025
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf-processing

Here are 69 public repositories matching this topic...

dissorial / doc-chatbot

allenai / papermage

ahmedkhemiri95 / PDFs-TextExtract

aws-samples / document-processing-pipeline-for-regulated-industries

Govind-S-B / pdf-to-text-chroma-search

ManasMadan / pdf-actions

ManasMadan / PDFActions

ranguy9304 / LangGraphRAG

enesmanan / paper-bold

Inc44 / MaTools

allanninal / document-summarizer

Yardenrsk / PsychometryReceiverCV

Aleptonic / PdfSnipper

thinhuos0913 / python_useful_mini_projects

Al-shwaib / Book-Preparation-for-Printing

arsath-eng / RAG1-NVIDIA-GENAI

Farhaj499 / RAG_with_Weaviate_DB

ydvrahul19 / Invoice-Manager

dsckiet / covid-tracker-android-app

rithulkamesh / docproc

Improve this page

Add this topic to your repo