This is a deep learning mapping based speech enhancement method.

English | 中文

This is a deep learning mapping based speech enhancement method.

The purpose of this project is to use the methods of DNN and CNN for speech enhancement, in which DNN uses three hidden layers with 512 nodes per hidden layer. CNN uses the network structure of R-CED and adds some resnet to prevent overfitting. You can also choose whether to use dropout or L2 and so on.

Attention:

make data before use this method you should have clean and corresponding noisy data. If your task is to do speech dereverberation, before running this code, you need cut data. The script in the cut_wav is helpful for you. If your task is to do feature enhancement, you can replace the log spetragram feature to other feature, e.g. MFCC

To use:

Step 1. run ex_trac.sh data prepare and extract log spectragram features.

Step 2. run train.sh to train model and test.

Step 3. ca_pesq.sh evaluate your result with PESQ

Ps:

The code is not perfect, continue to update…

I have tested it in REVERB challenge dataset and it could improve PESQ about from 2 to 2.8

Lately, we will update some GAN, Multi-task learning, and Multi-object learning-based model, some attention mechanism-based model also will be updated.

In the decode stage, you can choose G&L vocoder and you also could use the noisy speech original phase to synthetic speech, but I have tried G&L method it will not get better performance compared with use for original phase.

Environment dependence:

https://github.com/linan2/tensorflow-1.4.0.git

[1] Li N., Ge M., Wang L., Dang J. (2019) A Fast Convolutional Self-attention Based Speech Dereverberation Method for Robust Speech Recognition. In: Gedeon T., Wong K., Lee M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science, vol 11955. Springer, Cham

[2] Wang, K., Zhang, J., Sun, S., Wang, Y., Xiang, F., Xie, L. (2018) Investigating Generative Adversarial Networks Based Speech Dereverberation for Robust Speech Recognition. Proc. Interspeech 2018, 1581-1585, DOI: 10.21437/Interspeech.2018-1780.

[3] Ge, M., Wang, L., Li, N., Shi, H., Dang, J., Li, X. (2019) Environment-Dependent Attention-Driven Recurrent Convolutional Neural Network for Robust Speech Enhancement. Proc. Interspeech 2019, 3153-3157, DOI: 10.21437/Interspeech.2019-1477.

Email: linanvae@163.com

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
cut_data		cut_data
data		data
io_funcs		io_funcs
models		models
scripts		scripts
utils		utils
README.md		README.md
_pesq_itu_results.txt		_pesq_itu_results.txt
_pesq_results.txt		_pesq_results.txt
avr_pesq		avr_pesq
ca_pesq.sh		ca_pesq.sh
cln.txt		cln.txt
config.py		config.py
evaluate.py		evaluate.py
ex_trac.sh		ex_trac.sh
inputs.scp		inputs.scp
pesq		pesq
pre_process_data.py		pre_process_data.py
pre_process_test.py		pre_process_test.py
train.sh		train.sh
train.txt		train.txt

linan2/TensorFlow-speech-enhancement

Folders and files

Latest commit

History

Repository files navigation

This is a deep learning mapping based speech enhancement method.

Attention:

To use:

Ps:

About

Resources

Stars

Watchers

Forks

Languages