Name	Name	Last commit message	Last commit date
Latest commit History 180 Commits
config	config
data	data
eval	eval
penn	penn
results	results
runs	runs
test	test
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
requirements.txt	requirements.txt
run.sh	run.sh
setup.py	setup.py

Pitch-Estimating Neural Networks (PENN)

Training, evaluation, and inference of neural pitch and periodicity estimators in PyTorch. Includes the original code for the paper "Cross-domain Neural Pitch and Periodicity Estimation".

Installation
Inference
- Application programming interface
- Command-line interface
Training
- Download
- Preprocess
- Partition
- Train
- Monitor
Evaluation
- Evaluate
- Analyze
- Plot
Citation

Installation

If you want to perform pitch estimation using a pretrained FCNF0++ model, run pip install penn

If you want to train or use your own models, clone this repo and run pip install -r requirements.txt

Inference

Perform inference using FCNF0++

import penn

# Load audio at the correct sample rate
audio = penn.load.audio('test/assets/gershwin.wav')

# Here we'll use a 10 millisecond hopsize
hopsize = .01

# Provide a sensible frequency range given your domain and model
fmin = 30.
fmax = 1000.

# Choose a gpu index to use for inference. Set to None to use cpu.
gpu = 0

# If you are using a gpu, pick a batch size that doesn't cause memory errors
# on your gpu
batch_size = 2048

# Select a checkpoint to use for inference. The default checkpoint will
# download and use FCNF0++ pretrained on MDB-stem-synth and PTDB
checkpoint = penn.DEFAULT_CHECKPOINT

# Infer pitch and periodicity
pitch, periodicity = penn.from_audio(
    audio,
    penn.SAMPLE_RATE,
    hopsize=hopsize,
    fmin=fmin,
    fmax=fmax,
    checkpoint=checkpoint,
    batch_size=batch_size,
    gpu=gpu)

Application programming interface

`penn.from_audio`

"""Perform pitch and periodicity estimation

Args:
    audio: The audio to extract pitch and periodicity from
    sample_rate: The audio sample rate
    hopsize: The hopsize in seconds
    fmin: The minimum allowable frequency in Hz
    fmax: The maximum allowable frequency in Hz
    checkpoint: The checkpoint file
    batch_size: The number of frames per batch
    gpu: The index of the gpu to run inference on

Returns:
    pitch: torch.tensor(
        shape=(1, int(samples // penn.seconds_to_sample(hopsize))))
    periodicity: torch.tensor(
        shape=(1, int(samples // penn.seconds_to_sample(hopsize))))
"""

`penn.from_file`

"""Perform pitch and periodicity estimation from audio on disk

Args:
    file: The audio file
    hopsize: The hopsize in seconds
    fmin: The minimum allowable frequency in Hz
    fmax: The maximum allowable frequency in Hz
    checkpoint: The checkpoint file
    batch_size: The number of frames per batch
    gpu: The index of the gpu to run inference on

Returns:
    pitch: torch.tensor(shape=(1, int(samples // hopsize)))
    periodicity: torch.tensor(shape=(1, int(samples // hopsize)))
"""

`penn.from_file_to_file`

"""Perform pitch and periodicity estimation from audio on disk and save

Args:
    file: The audio file
    output_prefix: The file to save pitch and periodicity without extension
    hopsize: The hopsize in seconds
    fmin: The minimum allowable frequency in Hz
    fmax: The maximum allowable frequency in Hz
    checkpoint: The checkpoint file
    batch_size: The number of frames per batch
    gpu: The index of the gpu to run inference on
"""

`penn.from_files_to_files`

"""Perform pitch and periodicity estimation from files on disk and save

Args:
    files: The audio files
    output_prefixes: Files to save pitch and periodicity without extension
    hopsize: The hopsize in seconds
    fmin: The minimum allowable frequency in Hz
    fmax: The maximum allowable frequency in Hz
    checkpoint: The checkpoint file
    batch_size: The number of frames per batch
    gpu: The index of the gpu to run inference on
"""

Command-line interface

python -m penn
    --audio_files AUDIO_FILES [AUDIO_FILES ...]
    [-h]
    [--config CONFIG]
    [--output_prefixes OUTPUT_PREFIXES [OUTPUT_PREFIXES ...]]
    [--hopsize HOPSIZE]
    [--fmin FMIN]
    [--fmax FMAX]
    [--checkpoint CHECKPOINT]
    [--batch_size BATCH_SIZE]
    [--gpu GPU]

required arguments:
    --audio_files AUDIO_FILES [AUDIO_FILES ...]
        The audio files to process

optional arguments:
    -h, --help
        show this help message and exit
    --config CONFIG
        The configuration file. Defaults to using FCNF0++.
    --output_prefixes OUTPUT_PREFIXES [OUTPUT_PREFIXES ...]
        The files to save pitch and periodicity without extension.
        Defaults to audio_files without extensions.
    --hopsize HOPSIZE
        The hopsize in seconds. Defaults to 0.01 seconds.
    --fmin FMIN
        The minimum frequency allowed in Hz. Defaults to 31.0 Hz.
    --fmax FMAX
        The maximum frequency allowed in Hz. Defaults to 1984.0 Hz.
    --checkpoint CHECKPOINT
        The model checkpoint file. Defaults to ./penn/assets/checkpoints/fcnf0++.pt.
    --batch_size BATCH_SIZE
        The number of frames per batch. Defaults to 2048.
    --gpu GPU
        The index of the gpu to perform inference on. Defaults to CPU.

Training

Download

python -m penn.data.download

Downloads and uncompresses the mdb and ptdb datasets used for training.

Preprocess

python -m penn.data.preprocess --config <config>

Converts each dataset to a common format on disk ready for training. You can optionally pass a configuration file to override the default configuration.

Partition

python -m penn.partition

Generates train, valid, and test partitions for mdb and ptdb. Partitioning is deterministic given the same random seed. You do not need to run this step, as the original partitions are saved in penn/assets/partitions.

Train

python -m penn.train --config <config> --gpus <gpus>

Trains a model according to a given configuration on the mdb and ptdb datasets. Uses a list of GPU indices as an argument, and uses distributed data parallelism (DDP) if more than one index is given. For example, --gpus 0 3 will train using DDP on GPUs 0 and 3.

Monitor

Run tensorboard --logdir runs/. If you are running training remotely, you must create a SSH connection with port forwarding to view Tensorboard. This can be done with ssh -L 6006:localhost:6006 <user>@<server-ip-address>. Then, open localhost:6006 in your browser.

Evaluation

Evaluate

python -m penn.evaluate \
    --config <config> \
    --checkpoint <checkpoint> \
    --gpu <gpu>

Evaluate a model. <checkpoint> is the checkpoint file to evaluate and <gpu> is the GPU index.

Plot

python -m penn.plot.density \
    --config <config> \
    --true_datasets <true_datasets> \
    --inference_datasets <inference_datasets> \
    --output_file <output_file> \
    --checkpoint <checkpoint> \
    --gpu <gpu>

Plot the data distribution and inferred distribution for a given dataset and save to a jpg file.

python -m penn.plot.logits \
    --config <config> \
    --audio_file <audio_file> \
    --output_file <output_file> \
    --checkpoint <checkpoint> \
    --gpu <gpu>

Plot the pitch posteriorgram of an audio file and save to a jpg file.

python -m penn.plot.thresholds \
    --names <names> \
    --evaluations <evaluations> \
    --output_file <output_file>

Plot the periodicity performance (voiced/unvoiced F1) over mdb and ptdb as a function of the voiced/unvoiced threshold. names are the plot labels to give each evaluation. evaluations are the names of the evaluations to plot.

Citation

IEEE

M. Morrison, C. Hsieh, N. Pruyne, and B. Pardo, "Cross-domain Neural Pitch and Periodicity Estimation," IEEE Transactions on Speech and Audio Processing, <TODO - month> 2023.

BibTex

@inproceedings{morrison2023cross,
    title={Cross-domain Neural Pitch and Periodicity Estimation},
    author={Morrison, Max and Hsieh, Caedon and Pruyne, Nathan and Pardo, Bryan},
    booktitle={IEEE Transactions on Speech and Audio Processing},
    month={TODO},
    year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pitch-Estimating Neural Networks (PENN)

Table of contents

Installation

Inference

Application programming interface

`penn.from_audio`

`penn.from_file`

`penn.from_file_to_file`

`penn.from_files_to_files`

Command-line interface

Training

Download

Preprocess

Partition

Train

Monitor

Evaluation

Evaluate

Plot

Citation

IEEE

BibTex

About

Releases

Packages

Contributors 5

Languages

License

interactiveaudiolab/penn

Folders and files

Latest commit

History

Repository files navigation

Pitch-Estimating Neural Networks (PENN)

Table of contents

Installation

Inference

Application programming interface

penn.from_audio

penn.from_file

penn.from_file_to_file

penn.from_files_to_files

Command-line interface

Training

Download

Preprocess

Partition

Train

Monitor

Evaluation

Evaluate

Plot

Citation

IEEE

BibTex

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

`penn.from_audio`

`penn.from_file`

`penn.from_file_to_file`

`penn.from_files_to_files`

Packages