Usage

This repo is inherited from https://github.com/li-xirong/w2vvpp and https://github.com/danieljf24/dual_encoding

Requirements

Ubuntu 16.04
cuda 10
python 2.7.12
conda
PyTorch 1.2.0
tensorboard 1.14.0
numpy 1.16.4
keras
tensorflow

Set up environment

Change environment variables to desired folder and create folder storing dataset(VisualSearch)

export HOME=/path/to/desired/folder
cd $HOME
mkdir VisualSearch
git clone https://github.com/0902338471/W2VV.git
conda create -n W2VV python=2.7
conda activate W2VV
pip install -r ~/W2VV/w2vvpp/requirements.txt

Extract features with ResNet152

1.Run following code, replace ${your_data_name} variable by your own data name

mkdir ~/${your_data_name}   
mkdir ~/VisualSearch/${your_data_name}/
mkdir ~/VisualSearch/${your_data_name}/FeatureData/
mkdir ~/VisualSearch/${your_data_name}/TextData/

Download all your dataset inside folder ~/W2VV/DATASET/{train/val/test}/${your_data_name}.(storing images and captions data in separate subfolder)
Copying image caption file with format: [image-name] [text_catption] inside folder ~/VisualSearch/${data_name}/TextData/${data_name}.caption.txt/
Run following code, replace ${image_folder} ${output_features_name} with your folder image dataset and desired txt file storing extracted features respectively

python resnext_152_extract.py --data_path ${image_folder} -- feature_path ${output_features_name}.txt

Convert features txt file to bin file format

Run following code, replace ${output_features_name} and ${data_name}

python txt2bin.py 1000 ~/W2VV/${output_features_name} 0 ~/VisualSearch/${data_name}/FeatureData/mean_resnext101_resnet152

After previous steps, your dataset folder will have following format

${your_data_name}
├── FeatureData
│   └── mean_resnext101_resnet152
│       ├── feature.bin
│       ├── shape.txt
│       └── id.txt
└── TextData
    └── ${your_data_name}.caption.txt

FeatureData: extracted image feature.
feature.bin: extracted features in binary format
${your_data_name}.caption.txt: caption data. The file structure is as follows, in which the image and sentence in the same line are relevant.

image_id_1#1 sentence_1
image_id_1#2 sentence_2
...
image_id_n#1 sentence_k
...

Training

Building vocabulary for caption file

Run following code

cd ~/W2VV/w2vvpp
./do_build_vocab.sh ${data_name}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
dual_encoding		dual_encoding
w2vvpp		w2vvpp
.gitignore		.gitignore
README.md		README.md
resnet_152_extract.py		resnet_152_extract.py
txt2bin.py		txt2bin.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dual_encoding

dual_encoding

w2vvpp

w2vvpp

.gitignore

.gitignore

README.md

README.md

resnet_152_extract.py

resnet_152_extract.py

txt2bin.py

txt2bin.py

Repository files navigation

Usage

Requirements

Set up environment

Extract features with ResNet152

Convert features txt file to bin file format

Training

Building vocabulary for caption file

About

Releases

Packages

Languages

nttung1110/W2VV

Folders and files

Latest commit

History

Repository files navigation

Usage

Requirements

Set up environment

Extract features with ResNet152

Convert features txt file to bin file format

Training

Building vocabulary for caption file

About

Resources

Stars

Watchers

Forks

Languages