Show and Tell: A Neural Image Caption Generator

an implementation inspired by the paper by Oriol Vinyals, Alexander Toshev, Samy Bengio and Dumitru Erhan.

The Idea

We approach the problem in two stages:

Vision:

A pre-trained CNN is used to extract the image features.

In our case, we take a ResNet trained on ImageNet classification and detach it's head.

The penultimate layer gives us the features.
Language:

A pre-trained word embedding is used to process and tokenize the captions.

In our case, we use the 'en_core_web_lg' model from spaCy.

This is then teacher-forced to an RNN which predicts the next word.

Combining the two

The extracted features are treated as the initial hidden state of the RNN. In order to match the dimensionality, it's first sent through a Linear layer and reshaped.

On the basis of this conditioning, the model generates it's hidden states which are further sent through a Linear layer of dimension vocab_size.

Thus, at each timestep, we have a score for each possible word.

We treat this like a classification problem and use the categorical cross-entropy loss to match it to the desired label at each timestep.

Sampling

We use beam search to sample the most likely caption at evaluation time.

Using this repo

There are three ways you might want to use this project.

Learn: To follow along with the Jupyter Notebooks, go to the notebooks folder.
Apply: To execute on the command line, go to the captioner folder.
Serve: To serve the model as a flask microservice, go to the server folder.

Prerequisites

Follow the general steps in this tutorial to set up the environment and the startup file.

Make sure you have MagNet installed.
Install pycocotools in the conda environment by running

pip install "git+https://github.com/philferriere/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI"
Install the 'en_core_web_lg' model from spaCy by running python -m spacy download en_core_web_lg

Pre-trained Model

A pre-trained model is available here.

The hyperparameters are the defaults in the repo.

Place it in the checkpoints directory and you're good to go.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
assets		assets
captioner		captioner
notebooks		notebooks
serve		serve
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/ISSUE_TEMPLATE

.github/ISSUE_TEMPLATE

assets

assets

captioner

captioner

notebooks

notebooks

serve

serve

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

_config.yml

_config.yml

environment.yml

environment.yml

Repository files navigation

Show and Tell: A Neural Image Caption Generator

The Idea

Vision:

Language:

Combining the two

Sampling

Using this repo

Prerequisites

Pre-trained Model

About

Releases

Packages

Languages

License

svaisakh/captioner

Folders and files

Latest commit

History

Repository files navigation

Show and Tell: A Neural Image Caption Generator

The Idea

Vision:

Language:

Combining the two

Sampling

Using this repo

Prerequisites

Pre-trained Model

About

Resources

License

Stars

Watchers

Forks

Languages