torchaudio: an audio library for PyTorch

Support audio I/O (Load files, Save files)
- Load the following formats into a torch Tensor
  - mp3, wav, aac, ogg, flac, avr, cdda, cvs/vms,
  - aiff, au, amr, mp2, mp4, ac3, avi, wmv,
  - mpeg, ircam and any other format supported by libsox.
  - Kaldi (ark/scp)
Dataloaders for common audio datasets (VCTK, YesNo)
Common audio transforms
- Spectrogram, SpectrogramToDB, MelScale, MelSpectrogram, MFCC, MuLawEncoding, MuLawDecoding, Resample
Compliance interfaces: Run code using PyTorch that align with other libraries
- Kaldi: fbank, spectrogram, resample_waveform

Dependencies

pytorch (nightly version needed for development)
libsox v14.3.2 or above
[optional] vesis84/kaldi-io-for-python commit cb46cb1f44318a5d04d4941cf39084c5b021241e or above

Quick install on OSX (Homebrew):

brew install sox

Linux (Ubuntu):

sudo apt-get install sox libsox-dev libsox-fmt-all

Anaconda

conda install -c conda-forge sox

Installation

# Linux
python setup.py install

# OSX
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install

Quick Usage

import torchaudio
sound, sample_rate = torchaudio.load('foo.mp3')
torchaudio.save('foo_save.mp3', sound, sample_rate) # saves tensor to file

API Reference

API Reference is located here: http://pytorch.org/audio/

Conventions

Torchaudio is standardized around the following naming conventions.

waveform: a tensor of audio samples with dimensions (channel, time)
sample_rate: the rate of audio dimensions (samples per second)
specgram: a tensor of spectrogram with dimensions (channel, freq, time)
mel_specgram: a mel spectrogram with dimensions (channel, mel, time)
hop_length: the number of samples between the starts of consecutive frames
n_fft: the number of Fourier bins
n_mel, n_mfcc: the number of mel and MFCC bins
n_freq: the number of bins in a linear spectrogram
min_freq: the lowest frequency of the lowest band in a spectrogram
max_freq: the highest frequency of the highest band in a spectrogram
win_length: the length of the STFT window
window_fn: for functions that creates windows e.g. torch.hann_window

Transforms expect the following dimensions. In particular, the input of all transforms and functions assumes channel first.

Spectrogram: (channel, time) -> (channel, freq, time)
AmplitudeToDB: (channel, freq, time) -> (channel, freq, time)
MelScale: (channel, time) -> (channel, mel, time)
MelSpectrogram: (channel, time) -> (channel, mel, time)
MFCC: (channel, time) -> (channel, mfcc, time)
MuLawEncode: (channel, time) -> (channel, time)
MuLawDecode: (channel, time) -> (channel, time)
Resample: (channel, time) -> (channel, time)

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
build_tools		build_tools
docs		docs
test		test
torchaudio		torchaudio
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.flake8		.flake8
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

torchaudio: an audio library for PyTorch

Dependencies

Installation

Quick Usage

API Reference

Conventions

About

Releases

Packages

Languages

License

mlaradji/audio

Folders and files

Latest commit

History

Repository files navigation

torchaudio: an audio library for PyTorch

Dependencies

Installation

Quick Usage

API Reference

Conventions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages