RVL-BERT

This repository accompanies our IEEE Access paper "Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations" and contains validation experiments code and the models on the SpatialSense and the VRD dataset.

Installation

This project is constructed with Python 3.6, PyTorch 1.1.0 and CUDA 9.0 and largely based on VL-BERT.

Please follow the original instruction to install an conda environment.

Dataset

SpatialSense

Download the SpatialSense dataset here.
Put the files under $RVL_BERT_ROOT/data/spasen and unzip the images.tar.gz as images/ there. Ensure there're two folders (flickr/ and nyu) below $RVL_BERT_ROOT/data/spasen/images/.

VRD

Download the VRD dataset: images (Backup: download sg_dataset.zip from Baidu) and annotations
Put the sg_train_images/ and sg_test_images/ folders under $RVL_BERT_ROOT/data/vrd/images.
Put all .json files under $RVL_BERT_ROOT/data/vrd/.

Checkpoints & Pretrained Weights

Common

Download the pretrained weights here and put the pretrained_model/ folder under $RVL_BERT_ROOT/model/.

SpatialSense

Download the trained checkpoint here and put the .model file under $RVL_BERT_ROOT/checkpoints/spasen/.

VRD

Download the trained checkpoints and put the .model files under $RVL_BERT_ROOT/checkpoints/vrd/:

Validation

Run the following commands to reproduce experiment results. A single GPU (NVIDIA Quadro RTX 6000, 24G memory) is used by default.

SpatialSense

Full model

python spasen/test.py --cfg cfgs/spasen/full-model.yaml --ckpt checkpoints/spasen/full-model-e44.model --bs 8 --gpus 0 --model-dir ./ --result-path results/ --result-name spasen_full_model --split test --log-dir logs

VRD

Basic model:

python vrd/test.py --cfg cfgs/vrd/basic.yaml --ckpt checkpoints/vrd/basic-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic --split test --log-dir logs/

Basic model + Visual-Linguistic Commonsense Knowledge

python vrd/test.py --cfg cfgs/vrd/basic_vl.yaml --ckpt checkpoints/vrd/basic-vl-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/

Basic model + Visual-Linguistic Commonsense Knowledge + Spatial Module

python vrd/test.py --cfg cfgs/vrd/basic_vl_s.yaml --ckpt checkpoints/vrd/basic-vl-s-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/

Full model

python vrd/test.py --cfg cfgs/vrd/basic_vl_s_m.yaml --ckpt checkpoints/vrd/basic-vl-s-m-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/

Credit

This repository is mainly based on VL-BERT.

Citation

Please cite our paper if you find the paper or our code help your research!

@ARTICLE{9387302,
  author={M. -J. {Chiou} and R. {Zimmermann} and J. {Feng}},
  journal={IEEE Access}, 
  title={Visual Relationship Detection With Visual-Linguistic Knowledge From Multimodal Representations}, 
  year={2021},
  volume={9},
  number={},
  pages={50441-50451},
  doi={10.1109/ACCESS.2021.3069041}}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
cfgs		cfgs
checkpoints		checkpoints
common		common
external/pytorch_pretrained_bert		external/pytorch_pretrained_bert
logs		logs
model		model
scripts		scripts
spasen		spasen
vrd		vrd
README.md		README.md
requirements.txt		requirements.txt
requirements_deprecated.txt		requirements_deprecated.txt
rvl-bert.jpg		rvl-bert.jpg

coldmanck/RVL-BERT

Folders and files

Latest commit

History

Repository files navigation

RVL-BERT

Installation

Dataset

SpatialSense

VRD

Checkpoints & Pretrained Weights

Common

SpatialSense

VRD

Validation

SpatialSense

VRD

Credit

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages