Skip to content

epequeno/genaiic-insurance-claims

Repository files navigation

Tabulate

🚀 Extract custom information from unstructured documents with Generative AI

Contents

🔥 Overview

Tabulate is a CDK stack solution with the following features:

  • extract well-defined entities (e.g., name), numeric scores (e.g., sentiment) and free-form content (e.g., summary)
  • describe the list of attributes to be extracted from your docs without costly data annotation or model training
  • use Python API or the web app UI to analyze PDF, Office or image docs

screenshots/diagram.png

Click here to see a 1-minute demo recording.

Refer to the demo notebook for the implementation and usage examples.

Note: do not use the name "Tabulate" when presenting the solution in external customer engagements.

Example API call

docs = ['doc1', 'doc2']

features = [
    {"name": "delay", "description": "delay of the shipment in days"},
    {"name": "shipment_id", "description": "unique shipment identifier"},
    {"name": "summary", "description": "one-sentence summary of the text"},
]

run_tabulate_api(
    documents=docs,
    features=features,
)
# [{'delay': 2, 'shipment_id': '123890', 'summary': 'summary1'},
# {'delay': 3, 'shipment_id': '678623', 'summary': 'summary2'}]

Example Web UI

🔧 Deploy the App

Prerequisites

Make sure you have installed the following tools, languages as well as access to the target AWS account:

Clone the Repo

Clone the repo to a location of your choice:

git clone [email protected]:genaiic-reusable-assets/demo-artifacts/tabulate.git

Activate Environment

Navigate to the project folder and execute the following commands to create a virtualenv on MacOS and Linux and install dependencies:

python3 -m venv .venv
source .venv/bin/activate
pip install poetry
poetry install

Configure the Stack

Open and modify the config.yml file to specify your project name and modules you would like to deploy (e.g., whether to deploy a web app)

stack_name: tabulate   # Name of your demo, will be used as stack name and prefix for resources

...

streamlit:
  deploy_streamlit: True

CDK Bootstrap & Deploy

Bootstrap CDK in your account, ideally using the profile name you have used in the aws configure step. You can easily configure multiple accounts and bootstrap and deploy the framework to different accounts.

cdk bootstrap --profile [PROFILE_NAME]

Make sure the docker daemon is running in case you deploy the streamlit frontend. (On mac you can just open docker desktop)

You can deploy the framework stack.

cdk deploy --profile [PROFILE_NAME]

💻 Use the App

Option 1: Run API with Python

Follow steps in this notebook to run a job via an API call. You will need to:

  • provide input document text(s)
  • provide a list of features to be extracted

Option 2: Run web app

Add Cognito Users

  • Open the Cognito Console, choose the created user pool, and click create user
  • Provide the user name and a temporary password or email address for auto-generated password
    • Users will be able to log into the frontend using Cognito credentials

Access the Frontend

  • The URL to access the frontend appears as output at the end of the CDK deployment under "CloudfrontDistributionName"

or

  • Open the AWS console, and go to CloudFront
  • Copy the Domain name of the created distribution

👥 Team

Core team:

badge badge
Nikita Kozodoi Nuno Castro
Owner & Maintainer Science Manager

Contributors:

badge badge badge badge
Romain Besombes Zainab Afolabi Ivan Sosnovik Huong Vu

Acknowledgements:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published