Text2Vis

Text2Vis is a family of neural network models aimed at learning a mapping from short textual descriptions to visual features, so that one can search for images by simply providing a short description of it.

Text2Vis includes: (i) a sparse version, that takes a one-hot vector represeting the textual description as input; (ii) a dense version where words are embedded and given as input to an LSTM conditioning the last memory state to the visual space; and (iii) a Wide & Deep model, that combines both sparse and dense representations. We also included our reimplementation of the Word2VisualVec model with the MSE loss.

Note: to train the model, you need the visual features associated to the MsCOCO image repository. In our experiments, we considered the fc6 and fc7 layers of the Hybrid CNN. They were however too heavy to add them to the repository, but if you need them we will be very happy to share ours with you!

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.idea		.idea
captions		captions
pca		pca
src		src
visualembeddings		visualembeddings
wordembeddings		wordembeddings
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

captions

captions

pca

pca

src

src

visualembeddings

visualembeddings

wordembeddings

wordembeddings

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Text2Vis

About

Releases

Packages

Languages

AlexMoreo/tensorflow-Text2Vis

Folders and files

Latest commit

History

Repository files navigation

Text2Vis

About

Resources

Stars

Watchers

Forks

Languages