Skip to content

Latest commit

 

History

History
 
 

backend

SEC Insights Backend

Live at https://secinsights.ai/

Setup Dev Workspace

  1. Install pyenv and then use it to install the Python version in .python-version.
    1. install pyenv with curl https://pyenv.run | bash
  2. Install docker
  3. Run poetry shell
  4. Run poetry install to install dependencies for the project
  5. Create the .env file and source it. The .env.development file is a good template.
    1. cp .env.development .env
    2. Modify the .env by adding your own API keys
      • You can leave AWS_KEY & AWS_SECRET with dummy values
    3. set -a
    4. source .env
  6. If this is your first time spinning up the service on your machine, run docker compose up
    • This will spin up the DB, LocalStack, and the FastAPI server all in their own containers but interconnected.
    • After all services in the compose stack have started running, exit with ctrl+c
  7. Run make run to start the server locally
    • This spins up the Postgres 15 DB & Localstack in their own docker containers.
    • The server will not run in a container but will instead run directly on your OS.
      • This is to allow for use of debugging tools like pdb

Scripts

The scripts/ folder contains several scripts that are useful for both operations and development.

Chat 🦙

The script at scripts/chat_llama.py spins up a repl interface to start a chat within your terminal by interacting with the API directly. This is useful for debugging issues without having to interact with a full frontend.

The script takes an optional --base_url argument that defaults to http://localhost:8000 but can be specified to make the script point to the prod or preview servers. The Makefile contains chat & chat_prod commands that specify this arg for you.

Usage is as follows:

$ poetry shell  # if you aren't already in your poetry shell
$ make chat
poetry run python -m scripts.chat_llama
(Chat🦙) create
Created conversation with ID 8371bbc8-a7fd-4b1f-889b-d0bc882df2a5
(Chat🦙) detail
{
    "id": "8371bbc8-a7fd-4b1f-889b-d0bc882df2a5",
    "created_at": "2023-06-29T20:50:21.330170",
    "updated_at": "2023-06-29T20:50:21.330170",
    "messages": []
}
(Chat🦙) message Hi


=== Message 0 ===
{'id': '05db08be-bbd5-4908-bd68-664d041806f6', 'created_at': None, 'updated_at': None, 'conversation_id': '8371bbc8-a7fd-4b1f-889b-d0bc882df2a5', 'content': 'Hello! How can I assist you today?', 'role': 'assistant', 'status': 'PENDING', 'sub_processes': [{'id': None, 'created_at': None, 'updated_at': None, 'message_id': '05db08be-bbd5-4908-bd68-664d041806f6', 'content': 'Starting to process user message', 'source': 'constructed_query_engine'}]}


=== Message 1 ===
{'id': '05db08be-bbd5-4908-bd68-664d041806f6', 'created_at': '2023-06-29T20:50:36.659499', 'updated_at': '2023-06-29T20:50:36.659499', 'conversation_id': '8371bbc8-a7fd-4b1f-889b-d0bc882df2a5', 'content': 'Hello! How can I assist you today?', 'role': 'assistant', 'status': 'SUCCESS', 'sub_processes': [{'id': '75ace83c-1ebd-4756-898f-1957a69eeb7e', 'created_at': '2023-06-29T20:50:36.659499', 'updated_at': '2023-06-29T20:50:36.659499', 'message_id': '05db08be-bbd5-4908-bd68-664d041806f6', 'content': 'Starting to process user message', 'source': 'constructed_query_engine'}]}


====== Final Message ======
Hello! How can I assist you today?

SEC Document Downloader 📃

We have a script to easily download SEC 10-K & 10-Q files!

No API keys are needed to use this, it calls the SEC's free to use Edgar API.

The instructions below explain a process to use the script to download the SEC filings, convert the to PDFs, and store them in an S3 bucket.

Setup / Usage Instructions

Pre-requisite setup steps to use the downloader script to load the SEC PDFs directly into an S3 bucket.

These steps assume you've already followed the steps above for setting up your dev workspace!

  1. Setup AWS CLI
    1. Install AWS CLI
      • curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
      • unzip awscliv2.zip
      • sudo ./aws/install
    2. Configure AWS CLI
      • This is mainly to set the AWS credentials that will later be used by s3fs
      • Run aws configure and enter the access key & secret key for a AWS IAM user that has access to the PDFs where you want to store the SEC files.
        • set the default AWS region to us-east-1 (what we're primarily using).
  2. Setup s3fs
    1. Install s3fs
      • sudo apt install s3fs
    2. Setup a s3fs mounted folder
      • Create the mounted folder locally mkdir ~/mounted_folder
      • s3fs llama-app-web-assets-preview ~/mounted_folder
        • You can replace llama-app-web-assets-preview with the name of the S3 bucket you want to upload the files to.
  3. Install wkhtmltopdf
    • sudo apt-get update
    • sudo apt-get install wkhtmltopdf
  4. Get into your poetry shell with poetry shell from the project's root directory.
  5. Run the script! python scripts/download_sec_pdf.py -o ~/mounted_bucket --file-types="['10-Q','10-K']"
    • Take a 🚽 break while it's running, it'll take a while!
  6. Go to AWS Console and verify you're seeing the SEC files in the S3 bucket.

Seed DB Script 🌱

There are a collection of scripts we have for seeding the database with a set of documents. The script in scripts/seed_db.py is an attempt at consolidating those disparate scripts into one unified command.

This script will:

  1. Download a set of SEC 10-K & 10-Q documents to a local temp directory
  2. Upload those SEC documents to the S3 folder specified by $S3_ASSET_BUCKET_NAME
  3. Crawl through all the PDF files in the S3 folder and upsert a database row into the Document table based on the path of the file within the bucket

Use Cases

This is useful for times when:

  1. You want to setup a local environment with your local Postgres DB to have a set of documents in the documents table
    1. When running locally, this will use localstack to store the documents into a local S3 bucket instead of a real one.
  2. You want to update the documents present in either Prod or Preview DBs

Usage

To run the script, make sure you've:

  1. Activated your Python virtual environment using poetry shell
  2. Installed all the pre-requisite dependencies for the SEC Document Downloader script.
  3. Defined all the environment variables from .env.development within your shell environment according to the environment you want to execute the seed script (e.g. local, preview, prod environments)

After that you can run python scripts/seed_db.py to start the seed process.

To make things easier, the Makefile has some shorthand commands.

  1. make seed_db
    • Just runs the seed_db.py script with no CLI args, so just based on what env vars you've set
  2. make seed_db_preview
    • Same as make seed_db but only loads SEC documents from Amazon & Meta
    • We don't need to load that many company documents for Preview environments.
  3. make seed_db_local
    • To be used for local database seeding
    • Runs seed_db.py just for $AMZN & $META documents
    • Sets up the localstack bucket to actually serve the documents locally as well, so you can load them in your local browser.
  4. make seed_db_based_on_env
    • Automatically calls one of the above shorthands based on the RENDER & IS_PREVIEW_ENV environment variables