Metric Learning Pipelines for Speaker-Diarization

Speaker diarization using Deep Attention Model embeddings and metric learning. The variable parts of the architecture namely the negative sampling techniques (Random, Semi-Hard, Distance Weighted Sampling), types of loss (Triplet and Quadruplet) and margins (Fixed and Adaptive) are investigated to check for performance improvements in the given pipeline.

See the diagram below for a summary of the approach.

Requirements

python 3.6
numpy >= 1.11.0
tensorflow >= 1.5.0
scikit-learn >= 0.18
matplotlib >= 2.1.0
pyannote.core >= 1.3.1
pyannote.metrics >= 1.6.1
python-speech-features >= 0.6
sox >= 1.3.2
h5py >= 2.6.0

Prerequisites

The TEDLIUM Corpus used for training is available at http://www.openslr.org/7/ The CALLHome conversational speech corpus for testing is available at https://media.talkbank.org/ca/CallHome/. Use wget command to download the data for the different languages.

The required libraries can be installed using pip install -r requirements.txt

The corpora directory needs to be in the folder

/Data

Download the vbs_demo package from https://www.voicebiometry.org/download/vbs_demo.tgz and put in current directory.

Data preprocessing

TEDLIUM Corpus

Specify processing steps in process_ted.py and execute the same to obtain the MFCC segments for every recording.

Optionally build the train subset and development set with generate_ted_subset.py by specifying the respective paths.

CALLHOME Corpus

Specify processing steps in process_callhome.py and execute the same to obtain the MFCC segments.

Model training parameters

To specify the paths, network and training configurations,sampling type, type of loss, margin and other parameters,modify:

hyperparams.py

Training

Run run_metriclearn.py which saves the models evaluated at different steps in the log dir. The code also provides the list of training losses at every global step

Testing and Obtaining Diarization Metrics

First extract the embeddings from the trained model with run_testembeddings.py evaluated at the checkpoint of the least validation loss (dev_history.csv).

To perform diarization clustering: Execute run_diarization.py which stores the Diarization Error Rates in a .csv file.

Run the demo

Download the pre-extracted embeddings for the English language corpus from the CALLHOME dataset located in Data.tar CALLHOME folder. It can be used to execute run_diarization.py to obtain the DERs.

Citations

Our papers are cited as:

@INPROCEEDINGS{narayanaswamyspd,
  author={V. S. {Narayanaswamy} and J. J. {Thiagarajan} and H. {Song} and A. {Spanias}},
  booktitle={ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title={Designing an Effective Metric Learning Pipeline for Speaker Diarization},
  year={2019},
  volume={},
  number={},
  pages={5806-5810}
}

@inproceedings{Song2018,
  author={Huan Song and Megan Willi and Jayaraman J. Thiagarajan and Visar Berisha and Andreas Spanias},
  title={Triplet Network with Attention for Speaker Diarization},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3608--3612},
  doi={10.21437/Interspeech.2018-2305},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2305}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Models		Models
vbs_demo		vbs_demo
Data.tar		Data.tar
README.md		README.md
approach.pdf		approach.pdf
approach.png		approach.png
generate_ted_subset.py		generate_ted_subset.py
hyperparams.py		hyperparams.py
process_callhome.py		process_callhome.py
process_ted.py		process_ted.py
requirements.txt		requirements.txt
run_diarization.py		run_diarization.py
run_metriclearn.py		run_metriclearn.py
run_testembeddings.py		run_testembeddings.py
utils.py		utils.py

jjayaram7/Metric-Learning-Pipelines-for-Speaker-Diarization

Folders and files

Latest commit

History

Repository files navigation

Metric Learning Pipelines for Speaker-Diarization

Requirements

Prerequisites

Data preprocessing

TEDLIUM Corpus

CALLHOME Corpus

Model training parameters

Training

Testing and Obtaining Diarization Metrics

Run the demo

Citations

About

Resources

Stars

Watchers

Forks

Languages