Zero-shot learning for Referring Expression Generation
Master thesis by Lilian Schröder
-
necessary, but not included in this repo:
- referring expressions: refcoco_refdf.json.gz (see names_in_context/data)
- bounding boxes: mscoco_bbdf.json.gz (see names_in_context/data)
- visual features: mscoco_vgg19.npz
-
refcoco_splits.json: training-test splits
-
prettyraw.json: human-readable version of the referring expressions data (refcoco_refdf.json), needed for the draw_region script (find out image ID of a region)
- run preprocessing: prepare_refcoco.py
- run model: src/experiment_refcoco.py adapted from here: https://arxiv.org/abs/1708.02043
Running the code requires Python 2.7 and TensorFlow and other python machine learning libraries (pandas, numpy).
- src/lstm contains all code for data preparation and training of the REG model
- run "main.py" with desired configuration
- adjust pathes at the beginning
- the Data class accepts parameters for configuring the train/test split (words or categories to be excluded from the training)
- the LSTM class needs information like vocabulary size, output directory, IDs of the test set (for RE generation) and a dictionary that maps words to indexes
- after the training, the result directory contains the trained model as well as additional information (index to token list, vocabulary list, a list of the IDs in the test set etc.)
- content of a result directory:
- inject_refcoco_refrnn_compositional_3_512_1: actual TensorFlow model plus a JSON with generated
sequences (generated_captions.json)
- Attention: if this folder is moved, the "checkpoint" file needs to be adjusted (paths can simply be changed there directly)
- highest_prob_candidates.json: stores alternative predictions for region IDs where the "unknown" token was predicited
- all_highest_probs_x.json: for every region, for every position in the sequence, the top x predictions of the LSTM are stored for zero-shot learning (x = number of candidates stored)
- additional_vocab.txt: words needed in the embedding space, which are not in the LSTM vocabulary: e.g. a word was left out during training, it is put into this list (used for generation of the costum space)
- baseline_frequencies_topx.json: all words that the LSTM predicted for a category (parsing for nouns yields a position in the sequence, only the predictions at that time step are relevant); x is the number of words considered for the frequencies (1: only top-1 predictions are counted etc.)
- index2token.json: mapping of words to indexes used by the LSTM (needed if sequences are generated by a stored model)
- reduced_vocab_glove...txt: costum embedding space, name indicates the configuration (only names = the space only contains nouns)
- refs_moved_to_test.json: list of all region IDs that orginally belog to the training set, but were moved to the test set (because the model is supposed to know only a subset of the categories)
- token_freqs.json: list of all tokens of the vocabulary and their frequencies
- vocab_list.txt: list of all words in the vocabulary of the LSTM (all words that occur with a minimum frequency in the training data)
- words_too_rare.json: words that appear in the training data, but are not in the vocabulary because they are too seldom (was used for qualitative analysis of the unknown token)
- zero_shot_refs_x.json: referring expressions which were processed with the zero-shot script (not present before applying that script)
- inject_refcoco_refrnn_compositional_3_512_1: actual TensorFlow model plus a JSON with generated
sequences (generated_captions.json)
- src/eval contains code for the qualitative analysis and the computation of metrics (BLEU, CIDEr)
- analyse_...py: what does the model predict for an unseen category? (with words/categories left out during training, includes visualization of regions)
- bleu.py: interface to COCO evaluation code
- cats.txt: overview over category indexes
- evaluate.py: prepare data for COCO evaluation code
- generatecaptionsfromstoredmodel.py: generates referring expressions with a model stored in a file
- src/helper contains code for the application of zero-shot learning on referring expressions.
- draw_region.py: visualizes the bounding box on a displayed image (given region ID and image ID)
- generate_baseline.py: generates a list of 5 most frequent predictions for a single category, which can be used for a comparison with another model (like WAC) based on the baseline..json files in a model folder
- noun_list_long.txt: list of nouns used for noun parsing
- plot_embeddings.py: visualize neighbors of a word vector (also possible with two colors for two given spaces)
- word_embeddings.py: access to word embeddings, includes methods to generate a costum embedding space and for converting GloVe files to word2vec files
- zero_shot.py: apply zero-shot learning to the REs of a model, includes hit@k evaluation
- zero_shot_all.py: apply zero-shot learning to all categories at once
- scripts for running code on the university servers
- charts.py: visualize results
- There is no extra parameter for the application of zero-shot learning on all words (not only all categories, but all words). Instead, few lines in the LSTM class were commented out (see comments there)