Note
Form Recognizer is now Azure AI Document Intelligence!
The sample code part of this repo has stopped updating on April 26, 2024, please select the new repo : Document Intelligence Code Samples.
This repository contains example code snippets showing how Azure AI Document Intelligence can be used to get insights from documents.
Refer to this repo for the samples using the latest SDK version.
Refer to this repo for the samples using the latest SDK version.
Refer to this repo for the samples using the latest SDK version.
Refer to this repo for the samples using the latest SDK version.
The Layout model provides various building blocks like tables, paragraphs, section headings, etc. that can enable different semantic chunking strategies of the document. With semantic chunking in Retrieval Augmented Generation (RAG), it will be more efficient in storage and retrieval, together with the benefits of improved relevance and enhanced interpretability. The following samples show how to use the Layout model to do semantic chunking and use the chunks to do RAG.
File Name | Description |
---|---|
sample_rag_langchain.ipynb | Sample RAG notebook using Azure AI Document Intelligence as document loader, MarkdownHeaderSplitter and Azure AI Search as retriever in Langchain |
sample_identify_and_merge_cross_page_tables.ipynb and sample_identify_and_merge_cross_page_tables.py | Sample postprocessing script to identify and merge cross-page tables based on business rules. |
sample_figure_understanding.ipynb | Sample notebook showcasing how to crop the figures and send figure content (with its caption) to Azure Open AI GPT-4V model to understand the semantics. The figure description will be used to update the markdown output, which can be further used for semantic chunking. |
There are usually some pre/post processing steps that are needed to get the best results from the Document Intelligence models. These steps are not part of the Document Intelligence service, but are common steps that are needed to get the best results. The following samples show how to do these steps.
File Name | Description |
---|---|
sample_disambiguate_similar_characters.ipynb and sample_disambiguate_similar_characters.py | Sample postprocessing script to disambiguate similar characters based on business rules. |
sample_identify_cross_page_tables.ipynb and sample_identify_cross_page_tables.py | Sample postprocessing script to identify cross-page tables based on business rules. |