Skip to content

edavalosanaya/FastPoseCNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

FastPoseCNN: Real-time Monocular Category-Level 6D Pose and Size Estimation Framework

Created by Eduardo Davalos Anaya and Mehran Aminian from St. Mary's University.

model_intermediate_data_representation

Our method uses multiple representations to reconstruct an object's pose and size physical parameters. By decoupling these parameters, the framework achieves better performance and excellent inference speed.

Introduction

model_overall_architecture

This PyTorch project is the implementation of my thesis, FastPoseCNN: Real-time Monocular Category-Level 6D Pose and Size Estimation Framework. Note: that this thesis is just a proof of concept and requires more development to fully become a stable and commerically liable solution. That being said, FastPoseCNN provides an excellent tradeoff between speed, accuracy, and universaility.

Information about the project directory and file structure.

FastPoseCNN
|   README.md
|
|---datasets                                # location of all datasets
|   |
|   |---NOCS                                # dataset used for most experiments
|        
|---source_code
    |   environment_linux.yaml              # dependency files (strict for linux)
    |   environment.yaml                    # relaxed dependency files
    |
    |---FastPoseCNN
        |   .env                            # environmental variables file
        |   config.py                       # Contains hyperparameter container 
        |   setup_env.py                    # Script to setup environment vars.
        |   train.py                        # Script for all training routines 
        |   evaluate.py                     # Script for all evaluation routines
        |   inference.py                    # Script for inference tests
        |   ...
        |   
        |---lib                             # Directory with all PyTorch GPU code
        |   aggregation_layer.py
        |   gpu_tensor_funcs.py
        |   ...
        |   |
        |   |---ransac_voting_gpu_layer     # PVNet's hough voting implementation
        |
        |---tools                           # Numpy+PyTorch generic tools
            create_meta+.py
            visualization.py
            ...

Requirements

The specific libraries and their versions can be found in the environment.yaml (less strict) and environment_linux.yaml (more strict linux requirements). Overall, the most important dependecy requirements are the following:

  • python==3.8.5
  • pytorch==1.8.0
  • torchvision==0.8.2
  • cudatoolkit==10.2
  • numpy==1.19.2

Also, this project used the Hough Voting scheme and implementation from PVNet. The authors perform fantasic research, and without their released code, this project wouldn't be possible. Below, we have provided a brief citation to their GitHub repository and project page.

PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation
Sida Peng, Yuan Liu, Qixing Huang, Xiaowei Xhou, Hujun Bao
CVPR 2019 oral
Project Page

We placed their Hough Voting scheme within the lib directory. Intructions to compile the cuda source code is provided within PVNet GitHub's installation section.

Datasets

For this research, we used NOCS CAMERA and TEST datasets. Beware: the CAMERA dataset is very large (~140 GB). These datasets can be downloaded here:

We would like to personal thank the NOCS authors for providing these datasets. Below is an another brief GitHub link:

Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation
Created by He Wang, Srinath Sridhar, Jingwei Huang, Julien Valentin, Shuran Song, Leonidas J. Guibas from Stanford University, Goodle Inc., Princeton University, Facebook AI Research.
CVPR 2019 oral
Project Page

Training

Before running the train.py script, we recommend that you modify the HPARAM variable that defines the overall hyperparameters used during training. More information about these hyperparameters can be found in config.py file.

For training, we used the config.MASK_TRAINING and config.HEAD_TRAINING preset HPARAMs to train the model in a two stage system. Once you have modified your hyperparameters, you can run the training script with the following command:

python train.py 

Any hyperparameter can be changed in by adding --<HPARAM NAME>=<HPARAM VALUE>.

Evaluation

Before evaluating, download the NOCS dataset and the weights provided in the releases page. Additionally, modify the NOCS dataset by rename four folders to match the structure shown below. This is to simply the loading of the datasets' samples.

NOCS
|
|---camera
|   |
|   |---train
|   |   ...
|   |   
|   |---val
|       ...
|
|---real
    |
    |---train
    |   ...
    |
    |---test
        ...

After making these modifications, please execute the following commands:

python create_meta+.py --DATASET_NAME=camera --SUBSET_DATASET_NAME=train
python create_meta+.py --DATASET_NAME=cemera --SUBSET_DATASET_NAME=val
python create_meta+.py --DATASET_NAME=real   --SUBSET_DATASET_NAME=train
python create_meta+.py --DATASET_NAME=real   --SUBSET_DATASET_NAME=test

After all this steps, you should be able to execute the evalute.py routine. Just remember to modify the CHECKPOINT hyperparameter to reflect the location of the downloaded weights.

python evaluate.py --CHECKPOINT=<weights path>

Here is an example output of the FastPoseCNN framework using the provided in this repository.

example_output