🚀 Extract custom information from unstructured documents with Generative AI
Tabulate is a CDK stack solution with the following features:
- extract well-defined entities (e.g., name), numeric scores (e.g., sentiment) and free-form content (e.g., summary)
- describe the list of attributes to be extracted from your docs without costly data annotation or model training
- use Python API or the web app UI to analyze PDF, Office or image docs
Click here to see a 1-minute demo recording.
Refer to the demo notebook for the implementation and usage examples.
Note: do not use the name "Tabulate" when presenting the solution in external customer engagements.
Example API call
docs = ['doc1', 'doc2']
features = [
{"name": "delay", "description": "delay of the shipment in days"},
{"name": "shipment_id", "description": "unique shipment identifier"},
{"name": "summary", "description": "one-sentence summary of the text"},
]
run_tabulate_api(
documents=docs,
features=features,
)
# [{'delay': 2, 'shipment_id': '123890', 'summary': 'summary1'},
# {'delay': 3, 'shipment_id': '678623', 'summary': 'summary2'}]
Example Web UI
Make sure you have installed the following tools, languages as well as access to the target AWS account:
- AWS CLI
- AWS Account and User: we suggest configuring an AWS account with a profile
$ aws configure --profile [profile-name]
- Node.js
- IDE for your programming language
- AWS CDK Toolkit
- Python
Clone the repo to a location of your choice:
git clone [email protected]:genaiic-reusable-assets/demo-artifacts/tabulate.git
Navigate to the project folder and execute the following commands to create a virtualenv on MacOS and Linux and install dependencies:
python3 -m venv .venv
source .venv/bin/activate
pip install poetry
poetry install
Open and modify the config.yml
file to specify your project name and modules you would like to deploy (e.g., whether to deploy a web app)
stack_name: tabulate # Name of your demo, will be used as stack name and prefix for resources
...
streamlit:
deploy_streamlit: True
Bootstrap CDK in your account, ideally using the profile name you have used in the aws configure
step. You can easily configure multiple accounts and bootstrap and deploy the framework to different accounts.
cdk bootstrap --profile [PROFILE_NAME]
Make sure the docker daemon is running in case you deploy the streamlit frontend. (On mac you can just open docker desktop)
You can deploy the framework stack.
cdk deploy --profile [PROFILE_NAME]
Follow steps in this notebook to run a job via an API call. You will need to:
- provide input document text(s)
- provide a list of features to be extracted
- Open the Cognito Console, choose the created user pool, and click create user
- Provide the user name and a temporary password or email address for auto-generated password
- Users will be able to log into the frontend using Cognito credentials
- The URL to access the frontend appears as output at the end of the CDK deployment under "CloudfrontDistributionName"
or
- Open the AWS console, and go to CloudFront
- Copy the Domain name of the created distribution
Core team:
Nikita Kozodoi | Nuno Castro |
Owner & Maintainer | Science Manager |
Contributors:
Romain Besombes | Zainab Afolabi | Ivan Sosnovik | Huong Vu |
Acknowledgements: