Skip to content

Docker image for unstructured+langchain PDF document loading with OCR

License

Notifications You must be signed in to change notification settings

joshuasundance-swca/unstructured_pdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

unstructured_pdf

Docker image for unstructured+langchain PDF document loading with OCR

docker build --target ready -t unstructured_pdf .
docker run --rm -it unstructured_pdf python
from langchain_community.document_loaders import UnstructuredFileLoader
loader = UnstructuredFileLoader("/home/appuser/.dockerinit/test.pdf")
docs = loader.load()

Links

TODO

  • Add OCR
  • More robust layout support
  • Table support
  • Markdown conversion with LLM support
  • Reduce image size

About

Docker image for unstructured+langchain PDF document loading with OCR

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published