Image Captioning

Implementation of image captioning using CNN encoder and LSTM decoder on Flickr8k dataset.

Setup

Install the requirements:

pip install -r requirements.txt

Then download the pretrained Wikipedia2Vec emebeddings.

Usage

For training:

python3 captioning/run_image_captioning.py -t -d <PATH_TO_DATASET> -o <OUTPUT_DIRECTORY> -p <PATH_TO_Wikipedia2Vec_EMBEDDINGS>

For testing:

python3 captioning/run_image_captioning.py -e -d <PATH_TO_DATASET> -o <OUTPUT_DIRECTORY> -p <PATH_TO_Wikipedia2Vec_EMBEDDINGS>

Description

Encoder

The encoder is ResNet-18 pretrained on ImageNet classification dataset. The final classifcation layer has been replaced with a fully connected layer. Another fully connected layer has been added which maps fully connected layer to the required embedding size. The output of the final embedding layer is used to initialize the LSTM decoder.

During training I performed both fine-tuning and feature extraction. Fine-tuning is bit slower then feature extraction. As the Flickr-8k dataset is much smaller than ImageNet dataset and images in both datasets are similar so feature extraction is better than fine-tuning. So during feature extraction, the weights of only the newly added fully connected layers are updated.

Decoder

The decoder (LSTM) takes the output of final embedding layer of encoder and passes it as input to the first LSTM cell. During training teacher forcing is used for fast convergence. In teacher forcing, instead of passing the output of previous LSTM cell to the next LSTM cell, the target word is passed as input.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
captioning		captioning
.gitignore		.gitignore
README.md		README.md
config.ini		config.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

captioning

captioning

.gitignore

.gitignore

README.md

README.md

config.ini

config.ini

requirements.txt

requirements.txt

Repository files navigation

Image Captioning

Setup

Usage

Description

Encoder

Decoder

About

Releases

Packages

Languages

abdulqadirs/image-captioning

Folders and files

Latest commit

History

Repository files navigation

Image Captioning

Setup

Usage

Description

Encoder

Decoder

About

Topics

Resources

Stars

Watchers

Forks

Languages