Questions about evaluating recognizers #1079
-
Hi all, I am creating a recognizer using OpenSoundscape for one species of interest and would like to compare the performance of several versions of my recognizer that incorporate different parameters. I’ve been treating the Validation score that is output after training as an F1 score but I want to confirm that this interpretation is correct. I’m also curious about whether anyone has written code to evaluate recognizer performance using a confusion matrix as opposed to the score distribution histograms. I would like to see how many true and false positives/negatives my recognizer identifies. I see some mention of a confusion matrix in the metrics.py file but am unsure of how to go about making one. Apologies if I’ve missed the answers to these questions in tutorials/documentation. Thanks for the help! Charlotte |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi Charlotte, The default reported score metric is not the F1 score, it is the mean average precision. Note that F1 score depends on choosing a specific score threshold. You can use your model to predict on the validation set, threshold the predictions using a score, then compute the F1 score at a specific threshold using sklearn.metrics.precision_recall_f1_support (this is from memory the function name might be slightly different ) As for the confusion matrix, the idea of a confusion matrix is suited to single target classification but not multi target problems where a sample can have 0,1, or >1 correct labels. Most bioacoustics problems are framed as multi target classification of audio clips. If you are just interested in true and false positives/negatives for a single class you can directly count these from your thresholded predictions and labels. Let me know if you need assistance computing the scores you're looking for |
Beta Was this translation helpful? Give feedback.
Hi Charlotte,
The default reported score metric is not the F1 score, it is the mean average precision. Note that F1 score depends on choosing a specific score threshold. You can use your model to predict on the validation set, threshold the predictions using a score, then compute the F1 score at a specific threshold using sklearn.metrics.precision_recall_f1_support (this is from memory the function name might be slightly different )
As for the confusion matrix, the idea of a confusion matrix is suited to single target classification but not multi target problems where a sample can have 0,1, or >1 correct labels. Most bioacoustics problems are framed as multi target classification of audio…