Skip to content

The official code for "Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations" (IEEE Access, 2021)

Notifications You must be signed in to change notification settings

coldmanck/RVL-BERT

Repository files navigation

RVL-BERT

This repository accompanies our IEEE Access paper "Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations" and contains validation experiments code and the models on the SpatialSense and the VRD dataset.

Image of RVL-BERT architecture

Installation

This project is constructed with Python 3.6, PyTorch 1.1.0 and CUDA 9.0 and largely based on VL-BERT.

Please follow the original instruction to install an conda environment.

Dataset

SpatialSense

  1. Download the SpatialSense dataset here.
  2. Put the files under $RVL_BERT_ROOT/data/spasen and unzip the images.tar.gz as images/ there. Ensure there're two folders (flickr/ and nyu) below $RVL_BERT_ROOT/data/spasen/images/.

VRD

  1. Download the VRD dataset: images (Backup: download sg_dataset.zip from Baidu) and annotations
  2. Put the sg_train_images/ and sg_test_images/ folders under $RVL_BERT_ROOT/data/vrd/images.
  3. Put all .json files under $RVL_BERT_ROOT/data/vrd/.

Checkpoints & Pretrained Weights

Common

Download the pretrained weights here and put the pretrained_model/ folder under $RVL_BERT_ROOT/model/.

SpatialSense

Download the trained checkpoint here and put the .model file under $RVL_BERT_ROOT/checkpoints/spasen/.

VRD

Download the trained checkpoints and put the .model files under $RVL_BERT_ROOT/checkpoints/vrd/:

Validation

Run the following commands to reproduce experiment results. A single GPU (NVIDIA Quadro RTX 6000, 24G memory) is used by default.

SpatialSense

  • Full model
python spasen/test.py --cfg cfgs/spasen/full-model.yaml --ckpt checkpoints/spasen/full-model-e44.model --bs 8 --gpus 0 --model-dir ./ --result-path results/ --result-name spasen_full_model --split test --log-dir logs

VRD

  • Basic model:
python vrd/test.py --cfg cfgs/vrd/basic.yaml --ckpt checkpoints/vrd/basic-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic --split test --log-dir logs/
  • Basic model + Visual-Linguistic Commonsense Knowledge
python vrd/test.py --cfg cfgs/vrd/basic_vl.yaml --ckpt checkpoints/vrd/basic-vl-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/
  • Basic model + Visual-Linguistic Commonsense Knowledge + Spatial Module
python vrd/test.py --cfg cfgs/vrd/basic_vl_s.yaml --ckpt checkpoints/vrd/basic-vl-s-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/
  • Full model
python vrd/test.py --cfg cfgs/vrd/basic_vl_s_m.yaml --ckpt checkpoints/vrd/basic-vl-s-m-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/

Credit

This repository is mainly based on VL-BERT.

Citation

Please cite our paper if you find the paper or our code help your research!

@ARTICLE{9387302,
  author={M. -J. {Chiou} and R. {Zimmermann} and J. {Feng}},
  journal={IEEE Access}, 
  title={Visual Relationship Detection With Visual-Linguistic Knowledge From Multimodal Representations}, 
  year={2021},
  volume={9},
  number={},
  pages={50441-50451},
  doi={10.1109/ACCESS.2021.3069041}}

About

The official code for "Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations" (IEEE Access, 2021)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published