Skip to content

ayushidutta/cnn-image-classification

Repository files navigation

Multi-label Image Classification

Contains code for training and testing CNNs for multi-label image classification using various multi-label loss functions: Softmax, Sigmoid, Pairwise Ranking, WARP, LSEP using Tensorflow. Codebase follows Tensorflow(v1.3)'s image classification tutorial using slim, and incorporates custom loss functions for multi-labels.

Requirements

  • Tensorflow 1.3
  • Tensorflow Slim
  • Python 2

Dataset

The 'data' folder contains the train/test splits of nuswide dataset as an example. For the images and other relevant details, please refer to the individual dataset's page.

Extracting CNN features

Extract CNN features of images from various models like vgg, inception, resnet and save them in a matfile. Run the following, having changed any needed arguments.

dataset_dir=/home/ayushi/Git/research/dataset/nuswide/images/Flickr
checkpoint_path=../data/pretrained/vgg_16.ckpt
eval_file_image_list=../data/nuswide/nus1_train_list.txt
eval_file_image_features=../data/nuswide/net-vgg16/nus1_test_vgg16.mat
python extract.py \
    --dataset_dir=${dataset_dir} \
    --model_name=vgg_16 \
    --checkpoint_path=${checkpoint_path} \
    --bottleneck_scope=PreLogitsFlatten \
    --checkpoint_exclude_scopes=vgg_16/fc8 \
    --eval_file_image_list=${eval_file_image_list} \
    --eval_file_image_features=${eval_file_image_features} \
    --num_classes=81 \
    --bottleneck_shape=4096 \
    --batch_size=10 

where, dataset_dir refers to the directory which contains all the dataset images, checkpoint_path refers to the checkpoint file that can be downloaded from Tensorflow's checkpoint releases, eval_file_image_list contains the list of image names and eval_file_image_feaures refers to the matfile where the extracted features will be saved.

Train and Test CNN: with extracted CNN features

When training only the classifier, it can be done with only a fc layer on the extracted CNN features. Extract the CNN features as shown above, then run the following python file, having changed any needed arguments.

DATASET_DIR=../data/coco/
TRAIN_DIR=../data/coco/caffe-res1-101/sigmoid_logits/
CHECKPOINT_PATH=../data/coco/caffe-res1-101/sigmoid_logits/
train_file_image_features=../data/coco/caffe-res1-101/coco_train_r101.mat
train_file_image_annotations=../data/coco/coco_train_annot.txt
eval_file_image_features=../data/coco/caffe-res1-101/coco_train_r101.mat
eval_file_image_annotations=../data/coco/coco_train_annot.txt
eval_file_image_scores=../data/coco/caffe-res1-101/sigmoid_logits/coco_train_r101_pred_1.mat
python logits.py \
    --train_dir=${TRAIN_DIR} \
    --dataset_dir=${DATASET_DIR} \
    --dataset_name=coco \
    --dataset_split_name=train \
    --bottleneck_shape=2048 \
    --loss=sigmoid \
    --train_file_image_features=${train_file_image_features} \
    --train_file_image_annotations=${train_file_image_annotations} \
    --eval_file_image_features=${eval_file_image_features} \
    --eval_file_image_annotations=${eval_file_image_annotations} \
    --eval_file_image_scores=${eval_file_image_scores} \
    --run_opt=extract \
    --max_number_of_epochs=20 \
    --learning_rate=0.001 \
    --weight_decay=0.0005 \
    --batch_size=100 \
    --optimizer=rmsprop \
    --topK=3 \

where, run_opt is train or extract for training and testing modes respectively, loss can be any of the multi-label losses (softmax/sigmoid/ranking/warp/lsep); eval_file_image_scores is the matfile where the classifier predictions will be saved.

Evaluation (Performance metrics like MAP and F1 per image/label)

When testing CNN, the performance metrics of the test dataset will be printed. Refer to 'eval' folder for the evaluation code files or helper scripts.

Train CNN(Additional Option): with end-to-end network from images

Following Tensorflow, the dataset with images and corresponding labels, are saved in .tfrecord format. Refer to the convert_nuswide.py script in the datasets folder as an example as to how this has been done for the NUSWIDE dataset, and run.

python datasets/download_and_convert_data.py --dataset_name=nuswide --dataset_dir=./data/nuswide

To train, run the following, having changed any needed arguments.

DATASET_DIR=../data/nuswide/
TRAIN_DIR=../data/nuswide/net-incep-v4/
CHECKPOINT_PATH=../data/pretrained/inception_v4.ckpt
python train.py \
    --train_dir=${TRAIN_DIR} \
    --dataset_dir=${DATASET_DIR} \
    --dataset_name=nuswide \
    --dataset_split_name=train \
    --model_name=inception_v4 \
    --checkpoint_path=${CHECKPOINT_PATH} \
    --checkpoint_exclude_scopes=InceptionV4/Logits,InceptionV4/AuxLogits \
    --trainable_scopes=InceptionV4/Logits,InceptionV4/AuxLogits \
    --batch_size=5 \
    --loss=softmax

where, dataset_dir refers to the directory which contains the tfrecord subdirectory containing all the tfrecord train and test files; train_dir refers to the directory where the trained models will be saved; checkpoint_path refers to the checkpoint file that can be downloaded from Tensorflow's checkpoint releases. The network nodes to be finetuned or not can be controlled with trainable_scopes and checkpoint_exclude_scopes, loss can be any of the multi-label losses (softmax/sigmoid/ranking/warp/lsep).

About

CNN Multi Label Image Classification with Tensorflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages