You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reproduction pdf, free to download: https://bmjopensem.bmj.com/content/bmjosem/1/1/e000050.full.pdf
When converting it to markdown with no special tricks, I notice most text with a colored background is fully missing. This is not the case for Table 2, but all other tables suffer from this issue. The text is definitely present in the pdf.
I suspect this may be related by this code being commented, but not sure:
pymupdf4llm/pymupdf4llm/helpers/multi_column.py
# for i in range(len(new_rects) - 1, 0, -1):
# r = +new_rects[i]
# if in_bbox(r, path_rects): # text with shaded background
# shadow_rects.insert(0, r) # put in front to keep sequence
# del new_rects[i]
I couldn't reliably gauge if any other open issues are related to this, but it didn't seem like it. I'll try and debug it myself if noone comes to the rescue.
The text was updated successfully, but these errors were encountered:
Reproduction pdf, free to download:
https://bmjopensem.bmj.com/content/bmjosem/1/1/e000050.full.pdf
When converting it to markdown with no special tricks, I notice most text with a colored background is fully missing. This is not the case for Table 2, but all other tables suffer from this issue. The text is definitely present in the pdf.
I suspect this may be related by this code being commented, but not sure:
pymupdf4llm/pymupdf4llm/helpers/multi_column.py
I couldn't reliably gauge if any other open issues are related to this, but it didn't seem like it. I'll try and debug it myself if noone comes to the rescue.
The text was updated successfully, but these errors were encountered: