Skip to content

Code for master thesis on Zero-Shot Learning in multi-label scenarios

Notifications You must be signed in to change notification settings

xuepo99/Msc_Multi_label_ZeroShot

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zero-Shot MultiLabel

code for master thesis on Zero-Shot Classification with multilabel data.

Abstract

Visual recognition systems are often limited to the object categories previously trained on and thus suffer in their ability to scale. This is in part due to the difficulty of acquiring sufficient labeled images as the number of object categories grows. To solve this, earlier research have presented models that uses other sources, such as text data, to help classify object categories unseen during training. However, most of these models are limited on images with a single label and most images can contain more than one object category, and therefore more than one label. This master's thesis implements a model capable of classifying unseen categories for both single- and multi-labeled images.

The architecture consist of several modules: A pre-trained neural network that generates image features for each image, a model trained on text that represents words as vectors, and a neural network that projects the image features to the dimension native to the vector representation of words. On this architecture, we compared two approaches to generate word vectors using GloVe and Word2vec, with different vector dimensions and on spaces containing different numbers of word vectors. The model was adapted to multi-label predictions comparing three approaches for image box generation: YOLOv2, Faster R-CNN and randomly generated boxes. Here each box represents a section of the image cut out and this approach was chosen to fit each label to a one of these boxes.

The results showed that increasing the word vector dimension increased the accuracy, with Word2vec outperforming GloVe, and when adding more words to the word vector space the accuracy dropped. In the single-label scenario the model achieves similar results to existing models with similar architecture. While in the multi-label scenario, the model trained on boxes generated by Faster R-CNN and predicted on random generated boxes had highest accuracy, but was not able to outperform comparative alternatives. The architecture gives promising results, but more investigation is needed to answer if the results can be improved further.

Dependencies

Usage:

Object detection frameworks

Downloadables

Before training and testing

  • Download the pre-trained language model vectors.
  • Use py-faster-rcnn or YOLO to compute region-of-interest boxes.

Train Zero-Shot model

Single-label data

python tools/train_brute_force.py --imdb dataset --lm language model (e.g. w2v_wiki_300D) --loss squared_hinge --iters 10000

Multi-label data

python tools/train_ml_brute_force.py --imdb dataset --lm language model (e.g. w2v_wiki_300D) --loss squared_hinge --model ZSL_model (pre-trained on single-label data) --boxes (random, frcnn or yolo)--iters 10000

Test Zero-Shot model

Single-label data

python tools/test_brute_force.py --lm glove_wiki_300 --imdb imagenet_zs --ckpt output/train_bts/model_glove_wiki_300.hdf5 --singlelabel_predict

Multi-label data

python tools/test_brute_force.py --lm glove_wiki_300 --imdb imagenet_zs --ckpt output/train_bts/model_glove_wiki_300.hdf5 --boxes faster_rcnn

About

Code for master thesis on Zero-Shot Learning in multi-label scenarios

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.1%
  • Shell 4.9%