Visual diarization with support vector machines

What is this?

This is a python project for visual speaker diarization. The goal is to find out who speaks and when in videos with multi-speaker conversations in Broadcast News of Greek television, using lip motion features and Support Vector Machines.

Requirements

You need python 3.6.9 version and conda 4.6.11 environment.

Dataset and ground truth

The gridnews directory contains the videos that were used for training and testing as well the corresponding transcriptions files. dataset

Steps for running the project

Training

In visual_diarization_svm directory run the following scripts:

dlib_face_detection_svm_train.py (creates the optical flow features)
clear_opt_flow_features_train.py (ignores the incidental speakers)
- Should change manually the ids of each main speaker to a unique id at optical_flow_features_train.txt
svm_train.py (trains an svm classifier)

Testing

In visual_diarization_svm directory run the following scripts:

dlib_face_detection_svm_test.py (creates the optical flow features)
svm_test.py (predicts speech or not speech for each speaker in a frame)
- Should change manually the ids of each main speaker to a unique id at predictions_file.txt
svm_test_window_N (predicts speech or not speech for each speaker in a window of N frames)

Evaluation

In pyannote-parser-develop/tests directory run the trs_file_parse_visual_svm.py (computes the diarization error rate). Initialize the video variable with the video name. In reference parser write the corresponding transcription file.

video = 'NET20070331_thlep_1_2'

reference = parser.read("../gridnews/NET20070331/NET20070331.trs")

Alongside, in trs.py initialize the video variable with the video name and set the time limits (start, end)

video = 'NET20070331_thlep_1_2'

start = 1505.931
end = 1952.862

Time limits for the 4 testing videos:

Net20070329_thelep_1_1: start = 1705.173, end = 2455.011
Net20070330_thelep_1_1: start = 1180.084, end = 1540.335
Net20070330_thelep_1_4: start = 2246.728, end = 2615.844
Net20070331_thelep_1_2: start = 1505.931, end = 1952.862

Author

Charalampos Vossos

linkedin
email: vossosx96@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
pyannote-parser-develop		pyannote-parser-develop
visual_diarization_svm		visual_diarization_svm
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyannote-parser-develop

pyannote-parser-develop

visual_diarization_svm

visual_diarization_svm

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Visual diarization with support vector machines

What is this?

Requirements

Dataset and ground truth

Steps for running the project

Training

Testing

Evaluation

Author

License

About

Releases

Packages

Languages

License

cvossos2046/visual_speaker_diarization_svm_project

Folders and files

Latest commit

History

Repository files navigation

Visual diarization with support vector machines

What is this?

Requirements

Dataset and ground truth

Steps for running the project

Training

Testing

Evaluation

Author

License

About

Resources

License

Stars

Watchers

Forks

Languages