We generated a list of all the annotations in our universe; the scripts in this folder were used to (interactively) map them onto the GBIF and iNat taxonomies.
Creating the taxonomy CSV file requires running 3 scripts.
-
Generate a spreadsheet of the class names within each desired dataset by querying MegaDB. These class names are the names provided directly by our partner organizations and may include abbreviations, e.g., "wtd" meaning "white-tailed deer."
This is done by running the
taxonomy_mapping/species_by_dataset.py
script. The first time running this step may take a while. However, intermediary outputs are cached in JSON files for much faster future runs. -
Because each partner organization uses their own naming scheme, we need to map the class names onto a common taxonomy. We use a combination of the iNaturalist taxonomy and the Global Biodiversity Information Facility (GBIF) Backbone Taxonomy.
This is done by running the
taxonomy_mapping/process_species_by_dataset.py
script. Note that this script is not meant to be run as a normal Python script but is instead intended to be run interactively. -
Once the taxonomy CSV is generated, check for errors by running
python taxonomy_mapping/taxonomy_csv_checker.py /path/to/taxonomy.csv
The visualize_taxonomy.ipynb
notebook demonstrates how to visualize the taxonomy hierarchy. It requires the networkx and graphviz Python packages.