Deep Learning for scene understanding

Master in Computer Vision - M5 Visual recognition.

Team: Living dead students 🎓

Abstract

Applying and analysing deep learning state of the art techniques, we perform object detection, recognition, and semantic segmentation, evaluated on images from urban driving datasets.

Tasks progress 📈

Object recognition

For the object recognition problem, we implement and test several architectures, training them from scratch as well as fine-tuning using pretrained weights. We also boost the performance of the networks using different pre-processing techniques, and performing data augmentation and hyperparameter optimization.

Implement ResNet architecture and train it both from scratch and fine-tuning using ImageNet weights.
Implement InceptionV3 architecture and train it both from scratch and fine-tuning using ImageNet weights.
Implement DenseNet and train it from scratch. Test the use of Dropout layers.
Train VGG for TT100K, BelgiumTSC and KITTI datasets from scratch.
Transfer learning between TT100K and BelgiumTSC datasets using the VGG model.
Try several pre-processing methods for the TT100K dataset with VGG model.
Evaluate crop vs resize for the input images for the VGG model with TT100K dataset.
Boost the performance of the VGG model using data augmentation, bagging as well as optimizing the hyperparameters.

Object detection

For object detection, we train and test the YOLOv2 model using the ImageNet pretrained weights. We also implement the SSD model and boost the performance of the networks with pre-processing, hyperparameter optimization and data augmentation.

Implement the SSD architecture and train it from scratch for the Udacity and TT100k datasets.
Train YOLOv2 and Tiny-YOLO models for the TT100K and Udacity datasets using the ImageNet pretrained weights.
Boost the performance for the YOLOv2 model with preprocessing techniques and data augmentation.
Analyze the Udacity dataset and propose two approaches for dealing with the differences between the validation and test datasets. Test it with the YOLOv2 model.
Integrate F-score and FPS evaluation in our framework and evaluate YOLOv2 and Tiny-YOLO models.

Semantic segmentation

We implement several state-of-the-art semantic segmentation architectures, training them for the Camvid dataset. We train as well the FCN model for the Synthia dataset. This model is also boosted with hyperparameter optimization and data augmentation.

Implement the Segnet model (Segnet with VGG and the 'Segnet Basic' version) and train them from scratch for the Camvid dataset.
Train the FCN model for the Camvid and Synthia datasets from scratch.
Boost the performance for the FCN and SegNet models with preprocessing techniques and data augmentation.
Implement a semantic segmentation architecture using DenseNet as the classification architecture.
Implement a semantic segmentation architecture using InceptionV3 as the classification architecture.

Usage 💻

Fix the paths for the datasets in train.py for working on your machine.
Run the code

python train.py -c config/dataset.py -e expName

where dataset.py is the configuration file for this test, and expName is the name of the directory where the results are saved.

Documents 📋

Overleaf document for the report
Google slides
Summaries, results and trained weights for the used models

Name		Name	Last commit message	Last commit date
Latest commit History 219 Commits
code		code
code_bagging		code_bagging
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

code_bagging

code_bagging

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Deep Learning for scene understanding

Abstract

Tasks progress 📈

Object recognition

Object detection

Semantic segmentation

Usage 💻

Documents 📋

Contributors 👫👫

About

Releases

Packages

Languages

idoiaruiz/mcv-m5

Folders and files

Latest commit

History

Repository files navigation

Deep Learning for scene understanding

Abstract

Tasks progress 📈

Object recognition

Object detection

Semantic segmentation

Usage 💻

Documents 📋

Contributors 👫👫

About

Resources

Stars

Watchers

Forks

Languages