Questions about evaluating recognizers #1079

ctmittelstaedt · 2024-11-17T19:54:02Z

ctmittelstaedt
Nov 17, 2024

Hi all,

I am creating a recognizer using OpenSoundscape for one species of interest and would like to compare the performance of several versions of my recognizer that incorporate different parameters.

I’ve been treating the Validation score that is output after training as an F1 score but I want to confirm that this interpretation is correct.

I’m also curious about whether anyone has written code to evaluate recognizer performance using a confusion matrix as opposed to the score distribution histograms. I would like to see how many true and false positives/negatives my recognizer identifies. I see some mention of a confusion matrix in the metrics.py file but am unsure of how to go about making one.

Apologies if I’ve missed the answers to these questions in tutorials/documentation.

Thanks for the help!

Charlotte

Answered by sammlapp

Nov 18, 2024

Hi Charlotte,

The default reported score metric is not the F1 score, it is the mean average precision. Note that F1 score depends on choosing a specific score threshold. You can use your model to predict on the validation set, threshold the predictions using a score, then compute the F1 score at a specific threshold using sklearn.metrics.precision_recall_f1_support (this is from memory the function name might be slightly different )

As for the confusion matrix, the idea of a confusion matrix is suited to single target classification but not multi target problems where a sample can have 0,1, or >1 correct labels. Most bioacoustics problems are framed as multi target classification of audio…

View full answer

sammlapp · 2024-11-18T17:09:03Z

sammlapp
Nov 18, 2024
Maintainer

Hi Charlotte,

The default reported score metric is not the F1 score, it is the mean average precision. Note that F1 score depends on choosing a specific score threshold. You can use your model to predict on the validation set, threshold the predictions using a score, then compute the F1 score at a specific threshold using sklearn.metrics.precision_recall_f1_support (this is from memory the function name might be slightly different )

As for the confusion matrix, the idea of a confusion matrix is suited to single target classification but not multi target problems where a sample can have 0,1, or >1 correct labels. Most bioacoustics problems are framed as multi target classification of audio clips.

If you are just interested in true and false positives/negatives for a single class you can directly count these from your thresholded predictions and labels.

Let me know if you need assistance computing the scores you're looking for
Sam

1 reply

ctmittelstaedt Nov 19, 2024
Author

That clears everything up, thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about evaluating recognizers #1079

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Questions about evaluating recognizers #1079

ctmittelstaedt Nov 17, 2024

Replies: 1 comment · 1 reply

sammlapp Nov 18, 2024 Maintainer

ctmittelstaedt Nov 19, 2024 Author

ctmittelstaedt
Nov 17, 2024

Replies: 1 comment 1 reply

sammlapp
Nov 18, 2024
Maintainer

ctmittelstaedt Nov 19, 2024
Author