Image Captioning

Introduction

Generate captions from images using a deep learning model. When given an image, the model is able to describe in English what is in the image. In order to achieve this, our model is comprised of an encoder which is a CNN and a decoder which is an RNN. The CNN encoder is given images for a classification task and its output is fed into the RNN decoder which outputs English sentences.

The model and the tuning of its hyperparamaters are based on ideas presented in the paper Show and Tell: A Neural Image Caption Generator and Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.

We use the Microsoft Common Objects in COntext (MS COCO) dataset for this project. It is a large-scale dataset for scene understanding. The dataset is commonly used to train and benchmark object detection, segmentation, and captioning algorithms. For instructions on downloading the data, see the Data section below.

Changes from the original project by Trang Nguyen

This project is forked from a git repository created by Trang Nguyen. The following points have been changed:

The encoder used is a pre-trained instance of ResNeXt101_32x8d.
The decoder used is a two-layer GRU RNN instead of a single-layer LSTM RNN.
Training is done in training.py. Instead of sampling training captions into batches of fixed-length captions, the training captions are padded and packed with torch.nn.utils.rnn.pack_padded_sequence. This improves training speed on my machine and ensures that all training samples are used during a training epoch.
Evaluation of the model is done with the 'official' MS COCO Evaluation Code and the CIDEr score is used to decide whether the model has improved or not.
Besides MS COCO, I used free captioned images from pexels.com for the training.
I included a simple REST service rest_service.py which can be used to call the model via a simple web frontend. See this online demo on my home page.

Code

TBD

Setup

Install pycocoevalcap and the pycocotools by running

pip install git+https://github.com/salaniz/pycocoevalcap

Install PyTorch (4.0 recommended) and torchvision.
```
pip install pytorch torchvision 
```
Other dependencies:

Python 3
nltk
numpy
scikit-image
matplotlib
tqdm

Data

Download the following data from the COCO website, and place them, as instructed below, into a coco subdirectory located inside this project's directory:

under Annotations, download:
- 2014 Train/Val annotations [241MB] (extract captions_train2014.json, captions_val2014.json, instances_train2014.json and instances_val2014.json, and place them in the subdirectory coco/annotations/)
- 2014 Testing Image info [1MB] (extract image_info_test2014.json and place it in the subdirectory coco/annotations/)
under Images, download:
- 2014 Train images [83K/13GB] (extract the train2014 folder and place it in the subdirectory coco/images/)
- 2014 Val images [41K/6GB] (extract the val2014 folder and place it in the subdirectory coco/images/)
- 2014 Test images [41K/6GB] (extract the test2014 folder and place it in the subdirectory coco/images/)

Run

To train the model, run:

python training.py

To run any IPython Notebook, use:

jupyter notebook <notebook_name.ipynb>

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
frontend		frontend
images		images
.gitattributes		.gitattributes
.gitignore		.gitignore
0_Dataset.ipynb		0_Dataset.ipynb
1_Preliminaries.ipynb		1_Preliminaries.ipynb
2_Training.ipynb		2_Training.ipynb
3_Inference.ipynb		3_Inference.ipynb
LICENSE		LICENSE
README.md		README.md
annotator.py		annotator.py
captioning.ini		captioning.ini
data_loader.py		data_loader.py
evaluation.py		evaluation.py
model.py		model.py
pexel.py		pexel.py
rest_service.py		rest_service.py
score.py		score.py
training.py		training.py
utils.py		utils.py
vocab.pkl		vocab.pkl
vocabulary.py		vocabulary.py
wsgi.py		wsgi.py

License

fhswf/image_captioning

Folders and files

Latest commit

History

Repository files navigation

Image Captioning

Introduction

Changes from the original project by Trang Nguyen

Code

Setup

Data

Run

About

Resources

License

Stars

Watchers

Forks

Languages