Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

你好,可以请教一下是否对数据集进行了预处理呢? #12

Closed
shinobuwz opened this issue Mar 14, 2022 · 2 comments
Closed

Comments

@shinobuwz
Copy link

shinobuwz commented Mar 14, 2022

在data_loader中噪声文件和纯净文件的编号似乎是一一对应的,例如clean_speech=clean_file_1,那对应读取的噪声就是noise_file_1。为什么不采取分别从噪声文件夹和人声文件夹随机读取一个文件进行叠加呢?

@shinobuwz shinobuwz changed the title 你好,请问对噪声和纯净人声数据有做其他的预处理嘛?比如说拼接裁剪 你好,可以请教一下是否对数据集进行了预处理呢? Mar 16, 2022
@shinobuwz shinobuwz reopened this Mar 16, 2022
@Le-Xiaohuai-speech
Copy link
Owner

并没有太多数据预处理和增强,只有切片和随机信噪比。可以考虑的增强有随机增益,随机选择噪声文件,随机切分,随机信道滤波器等等都是可以用于数据预处理的。具体参考这篇论文: S. Braun and I. Tashev, “Data augmentation and loss normalization for deep noise suppression,” in Proc. Speech Comput. Springer, 2020, pp. 79–86.

@shinobuwz
Copy link
Author

并没有太多数据预处理和增强,只有切片和随机信噪比。可以考虑的增强有随机增益,随机选择噪声文件,随机切分,随机信道滤波器等等都是可以用于数据预处理的。具体参考这篇论文: S. Braun and I. Tashev, “Data augmentation and loss normalization for deep noise suppression,” in Proc. Speech Comput. Springer, 2020, pp. 79–86.

谢谢解答!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants