A Python package to work with SEC filings at scale. Also includes Mulebot, an open-source chatbot for SEC data that does not require storage. Integrated with datamule's APIs and datasets.
Articles: How to deploy a financial chatbot to the internet in 5 minutes
- Monitor EDGAR for new filings
- Parse textual filings into simplified HTML, interactive HTML, or structured JSON
- Download SEC filings quickly and easily
- Access datasets such as every MD&A from 2024 or every 2024 10-K converted to structured JSON
- Interact with SEC data using MuleBot
- Features
- Installation
- Quick Start
- Usage
- Examples
- Known Issues
- Roadmap
- Contributing
- License
- Change Log
- Other Useful SEC Packages
Basic installation:
pip install datamule
Installation with additional features:
pip install datamule[filing_viewer] # Install with filing viewer module
pip install datamule[mulebot] # Install with MuleBot (coming soon)
pip install datamule[all] # Install all extras
Available extras:
filing_viewer
: Includes dependencies for the filing viewer modulemulebot
: Includes MuleBot for interacting with SEC datamulebot_server
: Includes Flask server for running MuleBotall
: Installs all available extras
import datamule as dm
downloader = dm.Downloader()
downloader.download(form='10-K', ticker='AAPL')
Download speed is close to theoretical maximum set by SEC rate limits.
downloader = dm.Downloader()
Uses the EFTS API to retrieve filings. I am considering renaming download
to download_filings
.
download(self, output_dir = 'filings', return_urls=False,cik=None, ticker=None, form=None, date=None)
# Download all 10-K filings for Tesla using CIK
downloader.download(form='10-K', cik='1318605', output_dir='filings')
# Download 10-K filings for multiple companies using tickers
downloader.download(form='10-K', ticker=['TSLA', 'META'], output_dir='filings')
# Download every form 3 for a specific date
downloader.download(form='3', date='2024-05-21', output_dir='filings')
Uses the Company Concepts API to retrieve XBRL.
download_company_concepts(self, output_dir = 'company_concepts',cik=None, ticker=None)
Available datasets:
- 2024 10-K filings converted to JSON
10K
- Management's Discussion and Analysis (MD&A) sections extracted from 2024 10-K filings
MDA
- Every Company Concepts XBRL
XBRL
Also available on Dropbox
# Download all 2024 10-K filings converted to JSON
downloader.download_dataset('10K')
print("Monitoring SEC EDGAR for changes...")
changed_bool = downloader.watch(1, silent=False, cik=['0001267602', '0001318605'], form=['3', 'S-8 POS'])
if changed_bool:
print("New filing detected!")
Parses XBRL in JSON format to tables. SEC XBRL. See Parse every SEC XBRL to csv in ten minutes
from datamule import parse_company_concepts
table_dict_list = parse_company_concepts(company_concepts) # Returns a list of tables with labels
Parse textual filings into different formats. Uses datamule parser endpoint. If it is too slow for your use-case let me know. A faster endpoint is coming soon.
# Simplified HTML
simplified_html = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='simplify')
# Interactive HTML
interactive_html = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='interactive')
# JSON
json_data = dm.parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='json')
Parses html tables into a useful form. This exists, mostly, as a placeholder. Links: Visual Example
table_parser = TableParser()
Convert parsed filing JSON into HTML with features like a table of contents sidebar:
from datamule import parse_textual_filing
from datamule.filing_viewer import create_interactive_filing
data = parse_textual_filing(url='https://www.sec.gov/Archives/edgar/data/1318605/000095017022000796/tsla-20211231.htm', return_type='json')
create_interactive_filing(data)
Interact with SEC data using MuleBot. Mulebot uses tool calling to interface with SEC and datamule endpoints.
from datamule.mulebot import MuleBot
mulebot = MuleBot(openai_api_key)
mulebot.run()
To use Mulebot you will need an OpenAI API Key.
Mulebot server is a customizable front-end for Mulebot. Example
Artifacts:
- Filing Viewer
- Company Facts Viewer
- List Viewer
from datamule.mulebot.mulebot_server import server
def main():
# Your OpenAI API key
api_key = openai_api_key
server.set_api_key(api_key)
# Run the server
print("Starting MuleBotServer...")
server.run(debug=True, host='0.0.0.0', port=5000)
if __name__ == "__main__":
main()
Features (coming soon):
- Display XBRL tables with download / copy button.
- Download XBRL tables for a specific company in ZIP.
- Display sections of filings, like MD&A with links to filing viewer and original.
Quickstart
from datamule.mulebot.mulebot_server import server
def main():
# Your OpenAI API key
api_key = "sk-<YOUR_API_KEY>"
server.set_api_key(api_key)
# Run the server
print("Starting MuleBotServer...")
server.run(debug=True, host='0.0.0.0', port=5000)
if __name__ == "__main__":
main()
-
Some SEC files are malformed, which can cause parsing errors. For example, this Tesla Form D HTML from 2009 is missing a closing
</meta>
tag.Workaround:
from lxml import etree with open('filings/000131860509000005primary_doc.xml', 'r', encoding='utf-8') as file: html = etree.parse(file, etree.HTMLParser())
-
SEC Endpoints have issues. e.g. The EFTS search returns the primary doc url for https://www.sec.gov/Archives/edgar/data/1036804/000095011601000004/ as https://www.sec.gov/Archives/edgar/data/1036804/000095011601000004/0001.txt, when it should send you to https://www.sec.gov/Archives/edgar/data/1036804/000095011601000004/0000950116-01-000004.txt.
This is currently a low priority issue. Let me know if you need the data, and I'll move it up the priority list.
- Fix code debt.
- refactor downloader, and setup retriever. probably using urllib or requests for one-time
- filing viewer band-aid fix. will wait until mule parser update to devote more effort
- Paths may be messed up on non windows devices. Need to verify.
- Consider change from function calling to using ell.
- Analytics?
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT LICENSE.