NOTE: Currently there is a bug in the implementation and we are only able to overfit single scenes, and the loss has peaks during the training on the full datset.
We fuse a new detection module into ScanRefer by substituting the current PointNet++ and VoteNet based architecture with the novel Instance Segmentation ap-proach of PointGroup which demonstrated new SOTA results on ScanNet v2 and S3DIS (3D Instance Segmentation).
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch
Install the necessary packages for ScanRefer listed out in requirements.txt
:
pip install -r requirements.txt
Afterwards follow the PG instructions
Before moving on to the next step, please don't forget to set the project root path to the CONF.PATH.BASE
in lib/config.py
.
- Download the ScanRefer dataset and unzip it under
data/
. - Download the preprocessed GLoVE embeddings (~990MB) and put them under
data/
. - Download the ScanNetV2 dataset and put (or link)
scans/
under (or to)data/scannet/scans/
(Please follow the ScanNet Instructions for downloading the ScanNet dataset).
After this step, there should be folders containing the ScanNet scene data under the
data/scannet/scans/
with names likescene0000_00
- Pre-process ScanNet data. A folder named
scannet_data/
will be generated underdata/scannet/
after running the following command. Roughly 3.8GB free space is needed for this step:
cd data/scannet/
python batch_load_scannet_data.py
After this step, you can check if the processed scene data is valid by running:
python visualize.py --scene_id scene0000_00
To train the SparseScanRefer model with RGB values:
python scripts/script1.py
For more training options (batch_size, fix_pg,..), please run scripts/train.py -h
.
For additional detail, please see the ScanRefer and PointGroup papers:
"ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language"
"PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation"
Copyright (c) 2020 Dave Zhenyu Chen, Angel X. Chang, Matthias Nießner