This repository contains a script to synthesize noisy speech data from clean speech and noise files. The script allows you to specify the number of hours of data to generate and the range of Signal-to-Noise Ratio (SNR) values.
- Python 3.x
- Required Python packages (install using
pip install -r requirements.txt
)
-
Clone the repository:
git clone <repository-url> cd <repository-directory>
-
Install the required packages:
pip install -r requirements.txt
The script uses a configuration file (noisyspeech_synthesizer.cfg
) to set various parameters. Make sure to update the configuration file as needed.
To generate noisy speech data, run the following command:
python noisyspeech_synthesizer.py --cfg noisyspeech_synthesizer.cfg --total_hours <number_of_hours>
--cfg
: Path to the configuration file (default isnoisyspeech_synthesizer.cfg
).--cfg_str
: Section in the configuration file to use (default isnoisy_speech
).--total_hours
: Total hours of data to be created.
python noisyspeech_synthesizer.py --cfg noisyspeech_synthesizer.cfg --total_hours 100
This command will generate 100 hours of noisy speech data.
To upload the generated noisy speech data to a Hugging Face dataset, use the following script:
import os
from datasets import Dataset, DatasetDict, Audio
import pandas as pd
def create_dataset(noisyspeech_dir):
# List all noisy speech files
noisy_files = [os.path.join(noisyspeech_dir, f) for f in os.listdir(noisyspeech_dir) if f.endswith('.wav')]
# Create a dataframe with file paths and tags
data = {'file': noisy_files, 'label': ['noisy_speech'] * len(noisy_files)}
df = pd.DataFrame(data)
# Convert dataframe to Hugging Face Dataset
dataset = Dataset.from_pandas(df)
# Define audio column
dataset = dataset.cast_column("file", Audio())
return dataset
def upload_dataset(dataset, dataset_name):
# Create a DatasetDict
dataset_dict = DatasetDict({"train": dataset})
# Save to Hugging Face
dataset_dict.push_to_hub(dataset_name)
if __name__ == "__main__":
# Directory containing the noisy speech files
noisyspeech_dir = 'NoisySpeech_training'
# Dataset name on Hugging Face
dataset_name = 'rfhuang/audio-quality'
# Create dataset
dataset = create_dataset(noisyspeech_dir)
# Upload dataset
upload_dataset(dataset, dataset_name)
-
Ensure you are logged in to your Hugging Face account:
huggingface-cli login
-
Run the script to upload the dataset:
python upload_to_huggingface.py
Replace 'rfhuang/audio-quality'
with the appropriate dataset name on Hugging Face.
This project is licensed under the MIT License.