zero_shot_reg

Zero-shot learning for Referring Expression Generation

Master thesis by Lilian Schröder

data

necessary, but not included in this repo:
- referring expressions: refcoco_refdf.json.gz (see names_in_context/data)
- bounding boxes: mscoco_bbdf.json.gz (see names_in_context/data)
- visual features: mscoco_vgg19.npz
refcoco_splits.json: training-test splits
prettyraw.json: human-readable version of the referring expressions data (refcoco_refdf.json), needed for the draw_region script (find out image ID of a region)

code basis

run preprocessing: prepare_refcoco.py
run model: src/experiment_refcoco.py adapted from here: https://arxiv.org/abs/1708.02043

Zero-shot REG code

Requirements

Running the code requires Python 2.7 and TensorFlow and other python machine learning libraries (pandas, numpy).

Training the model

src/lstm contains all code for data preparation and training of the REG model
run "main.py" with desired configuration
- adjust pathes at the beginning
- the Data class accepts parameters for configuring the train/test split (words or categories to be excluded from the training)
- the LSTM class needs information like vocabulary size, output directory, IDs of the test set (for RE generation) and a dictionary that maps words to indexes
- after the training, the result directory contains the trained model as well as additional information (index to token list, vocabulary list, a list of the IDs in the test set etc.)
content of a result directory:
- inject_refcoco_refrnn_compositional_3_512_1: actual TensorFlow model plus a JSON with generated sequences (generated_captions.json)
  - Attention: if this folder is moved, the "checkpoint" file needs to be adjusted (paths can simply be changed there directly)
- highest_prob_candidates.json: stores alternative predictions for region IDs where the "unknown" token was predicited
- all_highest_probs_x.json: for every region, for every position in the sequence, the top x predictions of the LSTM are stored for zero-shot learning (x = number of candidates stored)
- additional_vocab.txt: words needed in the embedding space, which are not in the LSTM vocabulary: e.g. a word was left out during training, it is put into this list (used for generation of the costum space)
- baseline_frequencies_topx.json: all words that the LSTM predicted for a category (parsing for nouns yields a position in the sequence, only the predictions at that time step are relevant); x is the number of words considered for the frequencies (1: only top-1 predictions are counted etc.)
- index2token.json: mapping of words to indexes used by the LSTM (needed if sequences are generated by a stored model)
- reduced_vocab_glove...txt: costum embedding space, name indicates the configuration (only names = the space only contains nouns)
- refs_moved_to_test.json: list of all region IDs that orginally belog to the training set, but were moved to the test set (because the model is supposed to know only a subset of the categories)
- token_freqs.json: list of all tokens of the vocabulary and their frequencies
- vocab_list.txt: list of all words in the vocabulary of the LSTM (all words that occur with a minimum frequency in the training data)
- words_too_rare.json: words that appear in the training data, but are not in the vocabulary because they are too seldom (was used for qualitative analysis of the unknown token)
- zero_shot_refs_x.json: referring expressions which were processed with the zero-shot script (not present before applying that script)

Evaluation

src/eval contains code for the qualitative analysis and the computation of metrics (BLEU, CIDEr)
- analyse_...py: what does the model predict for an unseen category? (with words/categories left out during training, includes visualization of regions)
- bleu.py: interface to COCO evaluation code
- cats.txt: overview over category indexes
- evaluate.py: prepare data for COCO evaluation code
- generatecaptionsfromstoredmodel.py: generates referring expressions with a model stored in a file

Zero-shot module

src/helper contains code for the application of zero-shot learning on referring expressions.
- draw_region.py: visualizes the bounding box on a displayed image (given region ID and image ID)
- generate_baseline.py: generates a list of 5 most frequent predictions for a single category, which can be used for a comparison with another model (like WAC) based on the baseline..json files in a model folder
- noun_list_long.txt: list of nouns used for noun parsing
- plot_embeddings.py: visualize neighbors of a word vector (also possible with two colors for two given spaces)
- word_embeddings.py: access to word embeddings, includes methods to generate a costum embedding space and for converting GloVe files to word2vec files
- zero_shot.py: apply zero-shot learning to the REs of a model, includes hit@k evaluation
- zero_shot_all.py: apply zero-shot learning to all categories at once

Scripts

scripts for running code on the university servers
charts.py: visualize results

Hint

There is no extra parameter for the application of zero-shot learning on all words (not only all categories, but all words). Instead, few lines in the LSTM class were commented out (see comments there)

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

src

src

.gitignore

.gitignore

README.md

README.md

Repository files navigation

zero_shot_reg

data

code basis

Zero-shot REG code

Requirements

Training the model

Evaluation

Zero-shot module

Scripts

Hint

About

Releases

Packages

Languages

sinazarriess/zero_shot_reg

Folders and files

Latest commit

History

Repository files navigation

zero_shot_reg

data

code basis

Zero-shot REG code

Requirements

Training the model

Evaluation

Zero-shot module

Scripts

Hint

About

Resources

Stars

Watchers

Forks

Languages