You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PDFs take advantage of Poppler to create image previews; however, these are unnecessary if the file has embedded text for certain models (e.g. LayoutLMv1). We should make sure that the default scenario of poppler not being available still works.
The text was updated successfully, but these errors were encountered:
I am facing an error with the pdf2image library and mentioning to install Poppler to PATH.
This is my code:
def doc_type(temp_path):
p = pipeline('document-question-answering')
doc = document.load_document(temp_path)
response = p("What type of document is this?", **doc.context)
return response
The error I receive is : response = p("What type of document is this?", **doc.context) ^^^^^^^^^^^^ File "C:\Users\Cirruslabs\AppData\Local\Programs\Python\Python311\Lib\functools.py", line 1001, in __get__ val = self.func(instance) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\Cirruslabs\Documents\GitHub\Document-Processing-BE\venv\Lib\site-packages\docquery\document.py", line 117, in context images = self._images ^^^^^^^^^^^^ File "C:\Users\Cirruslabs\AppData\Local\Programs\Python\Python311\Lib\functools.py", line 1001, in __get__ val = self.func(instance) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\Cirruslabs\Documents\GitHub\Document-Processing-BE\venv\Lib\site-packages\docquery\document.py", line 156, in _images return [x.convert("RGB") for x in pdf2image.convert_from_bytes(self.b)] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Cirruslabs\Documents\GitHub\Document-Processing-BE\venv\Lib\site-packages\pdf2image\pdf2image.py", line 358, in convert_from_bytes return convert_from_path( ^^^^^^^^^^^^^^^^^^ File "C:\Users\Cirruslabs\Documents\GitHub\Document-Processing-BE\venv\Lib\site-packages\pdf2image\pdf2image.py", line 127, in convert_from_path page_count = pdfinfo_from_path( ^^^^^^^^^^^^^^^^^^ File "C:\Users\Cirruslabs\Documents\GitHub\Document-Processing-BE\venv\Lib\site-packages\pdf2image\pdf2image.py", line 594, in pdfinfo_from_path raise PDFInfoNotInstalledError( pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?
Is there any workaround to this. I've tried installing popper-utils and pdf2image and still no use.
PDFs take advantage of Poppler to create image previews; however, these are unnecessary if the file has embedded text for certain models (e.g. LayoutLMv1). We should make sure that the default scenario of poppler not being available still works.
The text was updated successfully, but these errors were encountered: