CameraTraps/taxonomy_mapping at main · amitkayal/CameraTraps

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
lila_to_wi_supplementary_mapping_file.csv		lila_to_wi_supplementary_mapping_file.csv
map_lila_categories.py		map_lila_categories.py
map_lila_taxonomy_to_wi_taxonomy.py		map_lila_taxonomy_to_wi_taxonomy.py
map_new_lila_datasets.py		map_new_lila_datasets.py
megadb_taxonomy_to_lila_taxonomy.py		megadb_taxonomy_to_lila_taxonomy.py
prepare_lila_taxonomy_release.py		prepare_lila_taxonomy_release.py
preview_lila_taxonomy.py		preview_lila_taxonomy.py
process_species_by_dataset.py		process_species_by_dataset.py
retrieve_sample_image.py		retrieve_sample_image.py
simple_image_download.py		simple_image_download.py
species_by_dataset.py		species_by_dataset.py
species_lookup.py		species_lookup.py
taxonomy_csv_checker.py		taxonomy_csv_checker.py
taxonomy_graph.py		taxonomy_graph.py
visualize_taxonomy.ipynb		visualize_taxonomy.ipynb

README.md

Mapping labels to a standard taxonomy (usually for new LILA datasets)

When a new .json file comes in and needs to be mapped to scientific names...

Assuming this is a LILA dataset, edit the LILA metadata file to include the new .json and dataset name.
Assuming this is a LILA dataset, use get_lila_category_list.py to download the .json files for every LILA dataset. This will produced a .json-formatted dictionary mapping each dataset to all of the categories it contains.
Use map_new_lila_datasets.py to create a .csv file mapping each of those categories to a scientific name and taxonomy. This will eventually become a subset of rows in the "master" .csv file. This is a semi-automated process; it will look up common names against the iNat and GBIF taxonomies, with some heuristics to avoid simple problems (like making sure that "greater_kudu" matches "greater kudu", or that "black backed jackal" matches "black-backed jackal"), but you will need to fill in a few gaps manually. I do this with three windows open: a .csv editor, Spyder (with the cell called "manual lookup" from this script open), and a browser. Once you generate this .csv file, it's considered permanent, i.e., the cell that wrote it won't re-write it, so manually edit to your heart's content.
Use preview_lila_taxonomy.py to produce an HTML file full of images that you can use to make sure that the matches were sensible; be particularly suspicious of anything that doesn't look like a mammal, bird, or reptile. Go back and fix things in the .csv file. This script/notebook also does a bunch of other consistency checking.
When you are totally satisfied with that .csv file, manually append it to the "master" .csv file (lila-taxonomy-mapping.csv), which is currently in a private repository. preview_lila_taxonomy can also be run against the master file.
Check for errors (one more time) (this should be redundant with what's now included in preview_lila_taxonomy.py, but it can't hurt) by running:
```
python taxonomy_mapping/taxonomy_csv_checker.py /path/to/taxonomy.csv
```
Prepare the "release" taxonomy file (which removes a couple columns and removes unused rows) using prepare_lila_taxonomy_release.py .
Use map_lila_categories.py to get a mapping of every LILA data set to the common taxonomy.
The visualize_taxonomy.ipynb notebook demonstrates how to visualize the taxonomy hierarchy. It requires the networkx and graphviz Python packages.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

taxonomy_mapping

taxonomy_mapping

README.md

Mapping labels to a standard taxonomy (usually for new LILA datasets)

Files

taxonomy_mapping

Directory actions

More options

Directory actions

More options

Latest commit

History

taxonomy_mapping

Folders and files

parent directory

README.md

Mapping labels to a standard taxonomy (usually for new LILA datasets)