Live at https://secinsights.ai/
- Install pyenv and then use it to install the Python version in
.python-version
.- install pyenv with
curl https://pyenv.run | bash
- install pyenv with
- Install docker
- Run
poetry shell
- Run
poetry install
to install dependencies for the project - Create the
.env
file and source it. The.env.development
file is a good template.cp .env.development .env
- Modify the
.env
by adding your own API keys- You can leave
AWS_KEY
&AWS_SECRET
with dummy values
- You can leave
set -a
source .env
- If this is your first time spinning up the service on your machine, run
docker compose up
- This will spin up the DB, LocalStack, and the FastAPI server all in their own containers but interconnected.
- After all services in the compose stack have started running, exit with ctrl+c
- Run
make run
to start the server locally- This spins up the Postgres 15 DB & Localstack in their own docker containers.
- The server will not run in a container but will instead run directly on your OS.
- This is to allow for use of debugging tools like
pdb
- This is to allow for use of debugging tools like
The scripts/
folder contains several scripts that are useful for both operations and development.
The script at scripts/chat_llama.py
spins up a repl interface to start a chat within your terminal by interacting with the API directly. This is useful for debugging issues without having to interact with a full frontend.
The script takes an optional --base_url
argument that defaults to http://localhost:8000
but can be specified to make the script point to the prod or preview servers. The Makefile
contains chat
& chat_prod
commands that specify this arg for you.
Usage is as follows:
$ poetry shell # if you aren't already in your poetry shell
$ make chat
poetry run python -m scripts.chat_llama
(Chat🦙) create
Created conversation with ID 8371bbc8-a7fd-4b1f-889b-d0bc882df2a5
(Chat🦙) detail
{
"id": "8371bbc8-a7fd-4b1f-889b-d0bc882df2a5",
"created_at": "2023-06-29T20:50:21.330170",
"updated_at": "2023-06-29T20:50:21.330170",
"messages": []
}
(Chat🦙) message Hi
=== Message 0 ===
{'id': '05db08be-bbd5-4908-bd68-664d041806f6', 'created_at': None, 'updated_at': None, 'conversation_id': '8371bbc8-a7fd-4b1f-889b-d0bc882df2a5', 'content': 'Hello! How can I assist you today?', 'role': 'assistant', 'status': 'PENDING', 'sub_processes': [{'id': None, 'created_at': None, 'updated_at': None, 'message_id': '05db08be-bbd5-4908-bd68-664d041806f6', 'content': 'Starting to process user message', 'source': 'constructed_query_engine'}]}
=== Message 1 ===
{'id': '05db08be-bbd5-4908-bd68-664d041806f6', 'created_at': '2023-06-29T20:50:36.659499', 'updated_at': '2023-06-29T20:50:36.659499', 'conversation_id': '8371bbc8-a7fd-4b1f-889b-d0bc882df2a5', 'content': 'Hello! How can I assist you today?', 'role': 'assistant', 'status': 'SUCCESS', 'sub_processes': [{'id': '75ace83c-1ebd-4756-898f-1957a69eeb7e', 'created_at': '2023-06-29T20:50:36.659499', 'updated_at': '2023-06-29T20:50:36.659499', 'message_id': '05db08be-bbd5-4908-bd68-664d041806f6', 'content': 'Starting to process user message', 'source': 'constructed_query_engine'}]}
====== Final Message ======
Hello! How can I assist you today?
We have a script to easily download SEC 10-K & 10-Q files!
No API keys are needed to use this, it calls the SEC's free to use Edgar API.
The instructions below explain a process to use the script to download the SEC filings, convert the to PDFs, and store them in an S3 bucket.
Pre-requisite setup steps to use the downloader script to load the SEC PDFs directly into an S3 bucket.
These steps assume you've already followed the steps above for setting up your dev workspace!
- Setup AWS CLI
- Install AWS CLI
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
- Configure AWS CLI
- This is mainly to set the AWS credentials that will later be used by s3fs
- Run
aws configure
and enter the access key & secret key for a AWS IAM user that has access to the PDFs where you want to store the SEC files.- set the default AWS region to
us-east-1
(what we're primarily using).
- set the default AWS region to
- Install AWS CLI
- Setup
s3fs
- Install s3fs
sudo apt install s3fs
- Setup a s3fs mounted folder
- Create the mounted folder locally
mkdir ~/mounted_folder
s3fs llama-app-web-assets-preview ~/mounted_folder
- You can replace
llama-app-web-assets-preview
with the name of the S3 bucket you want to upload the files to.
- You can replace
- Create the mounted folder locally
- Install s3fs
- Install
wkhtmltopdf
sudo apt-get update
sudo apt-get install wkhtmltopdf
- Get into your poetry shell with
poetry shell
from the project's root directory. - Run the script!
python scripts/download_sec_pdf.py -o ~/mounted_bucket --file-types="['10-Q','10-K']"
- Take a 🚽 break while it's running, it'll take a while!
- Go to AWS Console and verify you're seeing the SEC files in the S3 bucket.
There are a collection of scripts we have for seeding the database with a set of documents.
The script in scripts/seed_db.py
is an attempt at consolidating those disparate scripts into one unified command.
This script will:
- Download a set of SEC 10-K & 10-Q documents to a local temp directory
- Upload those SEC documents to the S3 folder specified by
$S3_ASSET_BUCKET_NAME
- Crawl through all the PDF files in the S3 folder and upsert a database row into the Document table based on the path of the file within the bucket
This is useful for times when:
- You want to setup a local environment with your local Postgres DB to have a set of documents in the
documents
table- When running locally, this will use
localstack
to store the documents into a local S3 bucket instead of a real one.
- When running locally, this will use
- You want to update the documents present in either Prod or Preview DBs
To run the script, make sure you've:
- Activated your Python virtual environment using
poetry shell
- Installed all the pre-requisite dependencies for the
SEC Document Downloader
script. - Defined all the environment variables from
.env.development
within your shell environment according to the environment you want to execute the seed script (e.g. local, preview, prod environments)
After that you can run python scripts/seed_db.py
to start the seed process.
To make things easier, the Makefile has some shorthand commands.
make seed_db
- Just runs the
seed_db.py
script with no CLI args, so just based on what env vars you've set
- Just runs the
make seed_db_preview
- Same as
make seed_db
but only loads SEC documents from Amazon & Meta - We don't need to load that many company documents for Preview environments.
- Same as
make seed_db_local
- To be used for local database seeding
- Runs
seed_db.py
just for $AMZN & $META documents - Sets up the localstack bucket to actually serve the documents locally as well, so you can load them in your local browser.
make seed_db_based_on_env
- Automatically calls one of the above shorthands based on the
RENDER
&IS_PREVIEW_ENV
environment variables
- Automatically calls one of the above shorthands based on the