Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Add nightly accuracy testing for text, image, table, and chart extraction #19

Open
2 tasks
drobison00 opened this issue Aug 29, 2024 · 0 comments
Open
2 tasks
Labels
feature request New feature or request

Comments

@drobison00
Copy link
Collaborator

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Currently preventing usage

Please provide a clear description of problem this feature solves

Implement a nightly testing pipeline to evaluate the accuracy and performance of our extraction processes for various document types. The testing should focus on extracting text, images, tables, and charts, and should cover multiple formats including jpeg, png, svg, jpg, docx, pptx, txt, pdf, and subsets of HTML.

Describe the feature, and optionally a solution or implementation and any alternatives

  • Capture Performance Metrics

    • Measure and capture time to completion for each document type.
    • Record pages per second throughput for each document type.
  • Generate Accuracy Metrics

    • Calculate F1 scores for text, image, table, and chart extraction for each document type.
    • Calculate Recall for text, image, table, and chart extraction for each document type.
    • Calculate Precision for text, image, table, and chart extraction for each document type.

Additional context

No response

@drobison00 drobison00 added the feature request New feature or request label Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant