Skip to content

combining zero-shot learning and referring expression generation

Notifications You must be signed in to change notification settings

sinazarriess/zero_shot_reg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 

Repository files navigation

zero_shot_reg

Zero-shot learning for Referring Expression Generation

Master thesis by Lilian Schröder

data

  • necessary, but not included in this repo:

    • referring expressions: refcoco_refdf.json.gz (see names_in_context/data)
    • bounding boxes: mscoco_bbdf.json.gz (see names_in_context/data)
    • visual features: mscoco_vgg19.npz
  • refcoco_splits.json: training-test splits

  • prettyraw.json: human-readable version of the referring expressions data (refcoco_refdf.json), needed for the draw_region script (find out image ID of a region)

code basis

  1. run preprocessing: prepare_refcoco.py
  2. run model: src/experiment_refcoco.py adapted from here: https://arxiv.org/abs/1708.02043

Zero-shot REG code

Requirements

Running the code requires Python 2.7 and TensorFlow and other python machine learning libraries (pandas, numpy).

Training the model

  • src/lstm contains all code for data preparation and training of the REG model
  • run "main.py" with desired configuration
    • adjust pathes at the beginning
    • the Data class accepts parameters for configuring the train/test split (words or categories to be excluded from the training)
    • the LSTM class needs information like vocabulary size, output directory, IDs of the test set (for RE generation) and a dictionary that maps words to indexes
    • after the training, the result directory contains the trained model as well as additional information (index to token list, vocabulary list, a list of the IDs in the test set etc.)
  • content of a result directory:
    • inject_refcoco_refrnn_compositional_3_512_1: actual TensorFlow model plus a JSON with generated sequences (generated_captions.json)
      • Attention: if this folder is moved, the "checkpoint" file needs to be adjusted (paths can simply be changed there directly)
    • highest_prob_candidates.json: stores alternative predictions for region IDs where the "unknown" token was predicited
    • all_highest_probs_x.json: for every region, for every position in the sequence, the top x predictions of the LSTM are stored for zero-shot learning (x = number of candidates stored)
    • additional_vocab.txt: words needed in the embedding space, which are not in the LSTM vocabulary: e.g. a word was left out during training, it is put into this list (used for generation of the costum space)
    • baseline_frequencies_topx.json: all words that the LSTM predicted for a category (parsing for nouns yields a position in the sequence, only the predictions at that time step are relevant); x is the number of words considered for the frequencies (1: only top-1 predictions are counted etc.)
    • index2token.json: mapping of words to indexes used by the LSTM (needed if sequences are generated by a stored model)
    • reduced_vocab_glove...txt: costum embedding space, name indicates the configuration (only names = the space only contains nouns)
    • refs_moved_to_test.json: list of all region IDs that orginally belog to the training set, but were moved to the test set (because the model is supposed to know only a subset of the categories)
    • token_freqs.json: list of all tokens of the vocabulary and their frequencies
    • vocab_list.txt: list of all words in the vocabulary of the LSTM (all words that occur with a minimum frequency in the training data)
    • words_too_rare.json: words that appear in the training data, but are not in the vocabulary because they are too seldom (was used for qualitative analysis of the unknown token)
    • zero_shot_refs_x.json: referring expressions which were processed with the zero-shot script (not present before applying that script)

Evaluation

  • src/eval contains code for the qualitative analysis and the computation of metrics (BLEU, CIDEr)
    • analyse_...py: what does the model predict for an unseen category? (with words/categories left out during training, includes visualization of regions)
    • bleu.py: interface to COCO evaluation code
    • cats.txt: overview over category indexes
    • evaluate.py: prepare data for COCO evaluation code
    • generatecaptionsfromstoredmodel.py: generates referring expressions with a model stored in a file

Zero-shot module

  • src/helper contains code for the application of zero-shot learning on referring expressions.
    • draw_region.py: visualizes the bounding box on a displayed image (given region ID and image ID)
    • generate_baseline.py: generates a list of 5 most frequent predictions for a single category, which can be used for a comparison with another model (like WAC) based on the baseline..json files in a model folder
    • noun_list_long.txt: list of nouns used for noun parsing
    • plot_embeddings.py: visualize neighbors of a word vector (also possible with two colors for two given spaces)
    • word_embeddings.py: access to word embeddings, includes methods to generate a costum embedding space and for converting GloVe files to word2vec files
    • zero_shot.py: apply zero-shot learning to the REs of a model, includes hit@k evaluation
    • zero_shot_all.py: apply zero-shot learning to all categories at once

Scripts

  • scripts for running code on the university servers
  • charts.py: visualize results

Hint

  • There is no extra parameter for the application of zero-shot learning on all words (not only all categories, but all words). Instead, few lines in the LSTM class were commented out (see comments there)

About

combining zero-shot learning and referring expression generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published