Skip to content

SparseScanRefer: Visual Grounding in RGB-D Scans with SparseConv and Dual-Set Clustering

License

Notifications You must be signed in to change notification settings

philippreiser/SparseScanRefer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SparseScanRefer: Visual Grounding in RGB-D Scans with SparseConv and Dual-Set Clustering

NOTE: Currently there is a bug in the implementation and we are only able to overfit single scenes, and the loss has peaks during the training on the full datset.

Introduction

We fuse a new detection module into ScanRefer by substituting the current PointNet++ and VoteNet based architecture with the novel Instance Segmentation ap-proach of PointGroup which demonstrated new SOTA results on ScanNet v2 and S3DIS (3D Instance Segmentation).

Setup

conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch

Install the necessary packages for ScanRefer listed out in requirements.txt:

pip install -r requirements.txt

Afterwards follow the PG instructions

Before moving on to the next step, please don't forget to set the project root path to the CONF.PATH.BASE in lib/config.py.

Data preparation

  1. Download the ScanRefer dataset and unzip it under data/.
  2. Download the preprocessed GLoVE embeddings (~990MB) and put them under data/.
  3. Download the ScanNetV2 dataset and put (or link) scans/ under (or to) data/scannet/scans/ (Please follow the ScanNet Instructions for downloading the ScanNet dataset).

After this step, there should be folders containing the ScanNet scene data under the data/scannet/scans/ with names like scene0000_00

  1. Pre-process ScanNet data. A folder named scannet_data/ will be generated under data/scannet/ after running the following command. Roughly 3.8GB free space is needed for this step:
cd data/scannet/
python batch_load_scannet_data.py

After this step, you can check if the processed scene data is valid by running:

python visualize.py --scene_id scene0000_00

Usage

Training

To train the SparseScanRefer model with RGB values:

python scripts/script1.py

For more training options (batch_size, fix_pg,..), please run scripts/train.py -h.

For additional detail, please see the ScanRefer and PointGroup papers: "ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language"
"PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation"

Copyright (c) 2020 Dave Zhenyu Chen, Angel X. Chang, Matthias Nießner

About

SparseScanRefer: Visual Grounding in RGB-D Scans with SparseConv and Dual-Set Clustering

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 94.6%
  • Cuda 3.0%
  • C++ 2.3%
  • Other 0.1%