[FEA]: Add nightly accuracy testing for text, image, table, and chart extraction #19

drobison00 · 2024-08-29T20:56:50Z

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Currently preventing usage

Please provide a clear description of problem this feature solves

Implement a nightly testing pipeline to evaluate the accuracy and performance of our extraction processes for various document types. The testing should focus on extracting text, images, tables, and charts, and should cover multiple formats including jpeg, png, svg, jpg, docx, pptx, txt, pdf, and subsets of HTML.

Describe the feature, and optionally a solution or implementation and any alternatives

Capture Performance Metrics
- Measure and capture time to completion for each document type.
- Record pages per second throughput for each document type.
Generate Accuracy Metrics
- Calculate F1 scores for text, image, table, and chart extraction for each document type.
- Calculate Recall for text, image, table, and chart extraction for each document type.
- Calculate Precision for text, image, table, and chart extraction for each document type.

Additional context

No response

drobison00 added the feature request New feature or request label Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA]: Add nightly accuracy testing for text, image, table, and chart extraction #19

[FEA]: Add nightly accuracy testing for text, image, table, and chart extraction #19

drobison00 commented Aug 29, 2024

[FEA]: Add nightly accuracy testing for text, image, table, and chart extraction #19

[FEA]: Add nightly accuracy testing for text, image, table, and chart extraction #19

Comments

drobison00 commented Aug 29, 2024

Is this a new feature, an improvement, or a change to existing functionality?

How would you describe the priority of this feature request

Please provide a clear description of problem this feature solves

Describe the feature, and optionally a solution or implementation and any alternatives

Additional context