SparseScanRefer: Visual Grounding in RGB-D Scans with SparseConv and Dual-Set Clustering

NOTE: Currently there is a bug in the implementation and we are only able to overfit single scenes, and the loss has peaks during the training on the full datset.

Introduction

We fuse a new detection module into ScanRefer by substituting the current PointNet++ and VoteNet based architecture with the novel Instance Segmentation ap-proach of PointGroup which demonstrated new SOTA results on ScanNet v2 and S3DIS (3D Instance Segmentation).

Setup

conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch

Install the necessary packages for ScanRefer listed out in requirements.txt:

pip install -r requirements.txt

Afterwards follow the PG instructions

Before moving on to the next step, please don't forget to set the project root path to the CONF.PATH.BASE in lib/config.py.

Data preparation

Download the ScanRefer dataset and unzip it under data/.
Download the preprocessed GLoVE embeddings (~990MB) and put them under data/.
Download the ScanNetV2 dataset and put (or link) scans/ under (or to) data/scannet/scans/ (Please follow the ScanNet Instructions for downloading the ScanNet dataset).

After this step, there should be folders containing the ScanNet scene data under the data/scannet/scans/ with names like scene0000_00

Pre-process ScanNet data. A folder named scannet_data/ will be generated under data/scannet/ after running the following command. Roughly 3.8GB free space is needed for this step:

cd data/scannet/
python batch_load_scannet_data.py

After this step, you can check if the processed scene data is valid by running:
python visualize.py --scene_id scene0000_00

Usage

Training

To train the SparseScanRefer model with RGB values:

python scripts/script1.py

For more training options (batch_size, fix_pg,..), please run scripts/train.py -h.

For additional detail, please see the ScanRefer and PointGroup papers: "ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language"
"PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation"

Name		Name	Last commit message	Last commit date
Latest commit History 207 Commits
benchmark		benchmark
config		config
data		data
demo		demo
docs		docs
lib		lib
models		models
scripts		scripts
testing		testing
util		util
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.sh		train.sh

License

philippreiser/SparseScanRefer

Folders and files

Latest commit

History

Repository files navigation

SparseScanRefer: Visual Grounding in RGB-D Scans with SparseConv and Dual-Set Clustering

NOTE: Currently there is a bug in the implementation and we are only able to overfit single scenes, and the loss has peaks during the training on the full datset.

Introduction

Setup

Data preparation

Usage

Training

About

Resources

License

Stars

Watchers

Forks

Languages