Jointly-Discovering-Visual-Objects-and-Spoken-Words

paper link (https://arxiv.org/pdf/1804.01452.pdf)

Requirement

Python 3.6, Tensorflow 1.8, wavio, python_speech_features

How to run:

1) download flickr8k speech caption files and image files
2) In the data folder, flickr8k.pkl provides paired information. Details of how to use this pickle file can be found in main_SISA or MISA python file.

3) python main_SISA/MISA.py

Experiment

Speech captions retrieve images for Flickr8k dataset:

this result is on test dataset, which is the last 1000 images and captions

R@1: 0.027, R@5: 0.127, R@10:0.245

Note: still working in progress

TODO list

1) image to caption retrieval
2) ...

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

src

src

README.md

README.md

Repository files navigation

Jointly-Discovering-Visual-Objects-and-Spoken-Words

paper link (https://arxiv.org/pdf/1804.01452.pdf)

Requirement

How to run:

Experiment

Speech captions retrieve images for Flickr8k dataset:

Note: still working in progress

TODO list

About

Releases

Packages

Languages

yijiuzai/Jointly-Discovering-Visual-Objects-and-Spoken-Words

Folders and files

Latest commit

History

Repository files navigation

Jointly-Discovering-Visual-Objects-and-Spoken-Words

paper link (https://arxiv.org/pdf/1804.01452.pdf)

Requirement

How to run:

Experiment

Speech captions retrieve images for Flickr8k dataset:

Note: still working in progress

TODO list

About

Resources

Stars

Watchers

Forks

Languages