GitHub

Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data (PyTorch implementation)

Weakly labelled data refers to that only the presence or absence of sound events are known in an audio clip. The onset and offset annotations of sound events are unknown. We proposed a sound event detection and time-frequency segmentation framework trained only using the weakly labelled data.

Figure here.

Dataset

The development dataset of DCASE 2018 Task 1 acoustic scene classification dataset is used as background noise. http://dcase.community/challenge2018/task-acoustic-scene-classification DCASE 2018 Task 2 general audio tagging dataset is used as sound events: http://dcase.community/challenge2018/task-general-purpose-audio-tagging

	Number of classes	Number of samples	Duration
DCASE 2018 Task 1	10	8640	24 h
DCASE 2018 Task 2	41	9473	~10 h

Method

We first mix the sound events with the background sound to 10 seconds audio clips. Each audio clip consists of 3 sound events. Only the weakly labelled data is used for training. One mixture is shown in the below figure.

The proposed method is trained only on the weakly labelled data shown in the red block in the figure below. The mapping g₁ is the segmentation mapping modeled by several convolutional layers. The mapping g₂ is the classification mapping modeled by a global pooling layer. The green and blue blocks indicate the sound event detection and sound event separation stages.

The source separation in the inference stage is shown in the figure below. Separated audio of sound events can be obtained from the segmentation masks.

The sound event detection in the inference stage is shown in the figure below.

The separated audio examples can be listened here.

Run the code

0. Prepare data. The dataset looks like:

.
└── TUT-urban-acoustic-scenes-2018-development
     ├── audio (8640 audios)
     │     └── ...
     ├── meta.csv
     └── ...   
└── DCASE 2018 Task 2
     ├── audio_train (9473 audios)
     │     └── ...
     ├── train.csv
     └── ...

1. Install dependent packages.

The code is implemented with Python 3 + PyTorch 0.4.0. You may need to install other independent packages with conda or pip.

2. Run.

Run the commands in runme.sh line by line, including:

(1) Modify the dataset path and workspace path.

(2) Create yaml file containing the information of how the sound events and background noise are mixed.

(3) Create mixed audio files.

(4) Extract features.

(5) Train network.

(6) Inference.

Citation

[1] Kong, Qiuqiang, Yong Xu, Iwona Sobieraj, Wenwu Wang, and Mark D. Plumbley. "Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data." arXiv preprint arXiv:1804.04715 (2018).

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
appendixes		appendixes
pytorch		pytorch
utils		utils
LICENSE.txt		LICENSE.txt
README.md		README.md
runme.sh		runme.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

appendixes

appendixes

pytorch

pytorch

utils

utils

LICENSE.txt

LICENSE.txt

README.md

README.md

runme.sh

runme.sh

Repository files navigation

Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data (PyTorch implementation)

Dataset

Method

Run the code

Citation

About

Releases

Packages

Languages

License

qiuqiangkong/sed_time_freq_segmentation

Folders and files

Latest commit

History

Repository files navigation

Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data (PyTorch implementation)

Dataset

Method

Run the code

Citation

About

Resources

License

Stars

Watchers

Forks

Languages