This is a project to examine whether it is possible to use differences in information reported by companies in Form SD disclosures to do something cool. Not sure what exactly.
Current idea:
- Create a dataset of every supplier or refiner (SOR) in 2023 using EX-1.01 of Form SD.
- Identify SORs shared between companies (e.g. Tesla and Apple)
- Exploit differences in information reported to do something cool.
Implementation
- Download all FORM SD using datamule
- Create a csv with columns 'filing_date', 'cik' (company unique identifier), and 'text' (text of the document)
- Convert text to a structured dataset using txt2dataset
- Explore data
- Force directed network graph of SORs and companies, coloring shared SORs differently
- Cluster Graphs to identify patterns in shared SORs (Turns out, almost every refinery is connected, so not needed)
- Treemap of SORS by country
Share exploration results
Ask for help connecting with people doing research in this space. Ideally an expert who knows what to look for / what questions would be interesting to answer.