- Write docstrings for functions and modules
- Make guide for managing training data and training new models. Make way for other researchers to share training data.
- Change framelength to be number of samples, not time (makes some operations like)
- The warmup for analysis is taking forever lately. We're running it on all data files the lab has---perhaps the merge_chunks() function has some inefficiency that's bogging it down when there are many files? Profile the code.
- Restore loss from when patience triggered, not best weights
- Implement data augmentation - Overlapping buzzes with ambient and other sounds - Add gaussian noise to buzz? Might be more generalized than agricultural ambient noise - Vary amplitude of buzzes - Smaller framehop for buzzes?
- Add pre-embedding of entire audio files
- Shouldn't need too much RAM allotment. 260,000 frames ~= 1,080MB. 1h of audio at 0.5 framehop = 7500 frames. Should be under 32MB data.
- Redefine chunks to be a number of audio samples (for memory estimation)
- Work in frame numbers instead of timestamps? Where frame 1, 2, 3, 4 is ~2.5s of audio because of overlap
- Timestamps are getting messy in the CSV (e.g., '4923.999999999') and I think this may cause problems when picking an analysis back up
- Would also facilitate pre-embedding long files
- Use full paths (not clipped) at verbosity 2
- Parallel processing: can I account for the edge case where chunk size is too large? E.g., you have 8 cores, each taking chunks of 3600 seconds, but only a single file that's 3600 seconds long. Optimal would be reducing chunk size.
- Memory allotment: if reading in files of higher samplerate, memory useage will exceed what's anticipated until resampling is completed.
- Add progress bar/estimated time to completion (dependent on verbosity?)
- Pare down YAMNet design; we've been using a pretty pre-packaged solution that does some internal framing itself; there's likely overhead since I'm pre-framing. Check out the github and see if we can use the raw model.
- Random forest model
- Phan et al., 2019 have a wicked cool model...arguably more suited to our kind of analysis.
- Rewrite training to use audio at native samplerate; try 44.1KHz audio instead of 16KHz and just don't worry that it stretches out the sound
- Retrain final layers of YAMNet during model training