Skip to content

armandvilalta/Full-network-multimodal-embeddings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Full-network-multimodal-embeddings

Code used in the paper: Studying the Impact of the Full-Network Embedding on Multimodal Pipelines (currently under review)

Similar to visual-semantic-embedding and order-embedding of which this repository is a fork, we map images and their captions into a common vector space.

This version adds the max loss option from F. Faghri, D.J. Fleet, J.R. Kiros and S. Fidler, VSE++: Improving Visual-Semantic Embeddings with Hard Negatives, arXiv preprint arXiv:1707.05612 (2017).

The precomputed image embeddings for the Full Network Embedding and the FC7 embedding can be downloaded from High Performance Artificial Intelligence Group at Barcelona Supercomputing Center. The trained models for the results in the paper are available at the same web.

Dependencies

Replicating the paper

Modifying the parameters

A detailed description of all the parameters can be found in parameters.py

Experiment info parameters

  • --experiment_name Name to identify the experiment.
  • --dataset_name Dataset: one of: "f8k", "f30k", "coco".
  • --model_name Name for the model saved file. The experiments in the paper use --model_name = --dataset_name + '_' + --experiment_name.

Data parameters

  • --data Dataset: one of: "f8k", "f30k", "coco".
  • --data_path Path to data
  • --embedding Embedding: one of: "AVGtt_Gfc7", "AVGtt_FN_KSBsp0.15n0.25_Gall".
  • --dim_image Dimensionality of image embedding.
    • If --embedding = 'AVGtt_Gfc7' then --dim_image = 4096 .
    • If --embedding = 'AVGtt_FN_KSBsp0.15n0.25_Gall' then --dim_image = 12416.

Embedding parameters

  • --dim Dimensionality of resulting multimodal embedding.
  • --dim_word Dimensionality of trainable word embedding.
  • --loss Loss function to use: one of: "SH", "MH", "OE", "MOE".
  • --abs Take absolute value of the embeddings. Useful for order embedding.
  • --img_norm Take L2 norm of image embedding. Useful for MH embeddings.
  • --method Method to use for the loss. Posible choices are: "order", * "cosine".

Training parameters

  • --margin Margin for contrastive loss in [0,1].
  • --max_epochs Max number of training epochs.
  • --dispFreq Number of samples proccessed before print stats.
  • --grad_clip Maximum module of backpropagation gradients in GRU.
  • --batch_size Batch size.
  • --validFreq Compute validation every --validFreq batches.
  • --lrate Learning rate.

Saving / loading parameters

  • --reload_ Reload existing model for further training.
  • --load_from Path to the file where model to load is saved.
  • --save_dir Folder where model trained is saved.

Test parameters

  • --test_subset Which of the two test and val subsets use for coco. Posible choices are: "1k", "5k".

Reference

If you found this code useful, please cite the following paper: Vilalta, Armand, et al. "Studying the impact of the Full-Network embedding on multimodal pipelines." Semantic Web Preprint: 1-15.

@article{vilaltastudying,
  title={Studying the impact of the Full-Network embedding on multimodal pipelines},
  author={Vilalta, Armand and Garcia-Gasulla, Dario and Par{\'e}s, Ferran and Ayguad{\'e}, Eduard and Labarta, Jesus and Moya-S{\'a}nchez, E Ulises and Cort{\'e}s, Ulises},
  journal={Semantic Web},
  number={Preprint},
  pages={1--15},
  publisher={IOS Press}
}

License

Apache License 2.0

About

Code used in the paper: Studying the Impact of the Full-Network Embedding on Multimodal Pipelines

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published