DCASE2020 task4: Sound event detection in domestic environments using source separation

Information about the DCASE 2020 challenge please visit the challenge website.
You can find discussion about the dcase challenge here: dcase-discussions.
This task follows dcase2019-task4. More info about 2019: Turpault et al., Serizel et al.

Updates

9th March 2020: update scripts to get the recorded data in the download.
18th March 2020: update the DESED_synth_dcase20_train_jams.taron DESED_synthetic and comment reverb since we do not use it for the baseline.
24th March 2020: release baseline without sound-separation

Dependencies

Python >= 3.6, pytorch >= 1.0, cudatoolkit=9.0, pandas >= 0.24.1, scipy >= 1.2.1, pysoundfile >= 0.10.2, scaper >= 1.3.5, librosa >= 0.6.3, youtube-dl >= 2019.4.30, tqdm >= 4.31.1, ffmpeg >= 4.1, dcase_util >= 0.2.5, sed-eval >= 0.2.1, psds-eval >= 0.0.1, desed >= 1.1.7

A simplified installation procedure example is provided below for python 3.6 based Anconda distribution for Linux based system:

install Ananconda
launch conda_create_environment.sh (recommended line by line)

Baseline

This year, a sound separation model is used: see sound-separation folder which is the fuss_repo integrated as a git subtree.

Source separation model

More info in Original FUSS model repo.

SED model

More info in the baseline folder.

Results

System performance are reported in term of event-based F-scores [[1]] with a 200ms collar on onsets and a 200ms / 20% of the events length collar on offsets.

Additionally, the PSDS [[2]] performance are reported.

	Baseline without sound separation	Baseline with sound separation
	Validation
Event-based	33.05 %
PSDS	0.403
PSDS cross-trigger	0.234
PSDS macro	0.199

Please refer to the PSDS paper [[2]] for more information about it. The parameters used for psds performances are:

Detection Tolerance parameter (dtc): 0.5
Ground Truth intersection parameter (gtc): 0.5
Cross-Trigger Tolerance parameter (cttc): 0.3
maximum False Positive rate (e_max): 100

The difference between the 3 performances reported:

	alpha_ct	alpha_st
PSDS	0	0
PSDS cross-trigger	1	0
PSDS macro	0	1

alpha_ct is the cost of cross-trigger, alpha_st is the cost of instability across classes.

Dataset

Scripts to generate the dataset

In the scripts/ folder, you can find the different steps to:

Download recorded data and synthetic material.
Generate synthetic soundscapes
Reverberate synthetic data (Not used in the baseline)
Separate sources of recorded and synthetic mixtures

It is likely that you'll have download issues with the real recordings. At the end of the download, please send a mail with the TSV files created in the missing_files directory. (to Nicolas Turpault and Romain Serizel).

However, if none of the audio files have been downloaded, it is probably due to an internet, proxy problem. See Desed repo or Desed_website for more info.

Description

The sound event detection dataset is using desed dataset.
To compute the separated sources, we use fuss_repo (included as sound-separation/ here (using subtree))
- Specifically, we use fuss baseline model and sound-separation/models/dcase2020_fuss_baseline/inference.py

Base dataset

The dataset for sound event detection of DCASE2020 task 4 is composed of:

Train:
- *weak (DESED, recorded, 1 578 files)
- *unlabel_in_domain (DESED, recorded, 14 412 files)
- synthetic soundbank (DESED, synthetic, 2 584 files)
*Validation (DESED, recorded, 1 168 files):
- test2018 (288 files)
- eval2018 (880 files)

Pre-computed data used to train baseline

Train:
- synthetic20/soundscapes [2584 files] (DESED)
- synthetic20/separated_sources [2584 files] (DESED)
- weak_ss/separated_sources [1578 folders] (uses fuss baseline_model and fuss_scripts)
- unlabel_in_domain_ss/separated_sources [14 412 folders] (uses fuss baseline_model and fuss_scripts)
Validation
- validation_ss/separated_sources [1168 files] (uses fuss baseline_model and fuss_scripts)

Note: the reverberated data (see scripts) are not computed for the baseline

Baselines dataset

SED baseline

Train:
- weak
- unlabel_in_domain
- synthetic20/soundscapes (separated in train/valid-80%/20%)
Validation:
- validation

SED + SS baseline

Train:
- weak + weak_ss/separated_sources
- unlabel_in_domain + unlabel_in_domain_ss/separated_sources
- synthetic20/soundscapes + synthetic20/separated_sources
Validation:
- validation + validation_ss/separated_sources

Annotation format

Weak annotations

The weak annotations have been verified manually for a small subset of the training set. The weak annotations are provided in a tab separated csv file (.tsv) under the following format:

[filename (string)][tab][event_labels (strings)]

For example:

Y-BJNMHMZDcU_50.000_60.000.wav	Alarm_bell_ringing,Dog

Strong annotations

Synthetic subset and validation set have strong annotations.

The minimum length for an event is 250ms. The minimum duration of the pause between two events from the same class is 150ms. When the silence between two consecutive events from the same class was less than 150ms the events have been merged to a single event. The strong annotations are provided in a tab separated csv file (.tsv) under the following format:

[filename (string)][tab][event onset time in seconds (float)][tab][event offset time in seconds (float)][tab][event_label (strings)]

For example:

YOTsn73eqbfc_10.000_20.000.wav	0.163	0.665	Alarm_bell_ringing

A word on sound separation dataset

Free Universal Sound Separation (FUSS) Dataset

The free universal sound separation (FUSS) dataset [3] contains mixtures of arbitrary sources of different types for use in training sound separation models. Each 10 second mixture contains between 1 and 4 sounds.

The source clips for the mixtures are from a prerelease of FSD50k [4], [5], which is composed of Freesound content annotated with labels from the AudioSet Ontology. Using the FSD50k labels, the sound source files have been screened such that they likely only contain a single type of sound. Labels are not provided for these sound source files, and are not considered part of the challenge, although they will become available when FSD50k is released.

Train:

20000 mixtures

Validation:

1000 mixtures

Authors

Author	Affiliation
Nicolas Turpault	INRIA
Romain Serizel	University of Lorraine
Scott Wisdom	Google Research
John R. Hershey	Google Research
Hakan Erdogan	Google Research
Justin Salamon	Adobe Research
Dan Ellis	Google Research
Prem Seetharaman	Northwestern University

Contact

If you have any problem feel free to contact Nicolas (and Romain )

References

[[1]] A. Mesaros, T. Heittola, & T. Virtanen, "Metrics for polyphonic sound event detection", Applied Sciences, 6(6):162, 2016
[[2]] C. Bilen, G. Ferroni, F. Tuveri, J. Azcarreta, S. Krstulovic, A Framework for the Robust Evaluation of Sound Event Detection.
[[3]] Scott Wisdom, Hakan Erdogan, Daniel P. W. Ellis, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Justin Salamon, Prem Seetharaman, and John R. Hershey. What's all the fuss about free universal sound separation data? In preparation. 2020.
[[4]] E. Fonseca, J. Pons, X. Favory, F. Font, D. Bogdanov, A. Ferraro, S. Oramas, A. Porter, and X. Serra. Freesound datasets: a platform for the creation of open audio datasets. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017), 486–493. Suzhou, China, 2017.
[[5]] F. Font, G. Roma, and X. Serra. Freesound technical demo.

In Proceedings of the 21st ACM international conference on Multimedia, 411–412. ACM, 2013. [1]: http://dcase.community/documents/challenge2019/technical_reports/DCASE2019_Delphin_15.pdf [2]: https://arxiv.org/pdf/1910.08440.pdf [3]: ./ [4]: https://repositori.upf.edu/bitstream/handle/10230/33299/fonseca_ismir17_freesound.pdf [5]: mtg.upf.edu/system/files/publications/Font-Roma-Serra-ACMM-2013.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
baseline		baseline
data_generation		data_generation
dataset		dataset
scripts		scripts
sound-separation		sound-separation
.gitignore		.gitignore
README.md		README.md
conda_create_environment.sh		conda_create_environment.sh

skarbs001/dcase20_task4

Folders and files

Latest commit

History

Repository files navigation