GitHub - riven314/DeepLearning-Navigation: Document my capstone project in master programme

About

This is a deep learning-based local navigation system for the visually impaired users. The system is prototyped in Python and it offers 3 special features:

A segmentation module with low latency (around 20 FPS) and reliable segmentation performance
A scene understanding module for summarising spatial scene into grid of objects
An Obstacle avoidance module for detection of closest obstacle

Hardware Specification

To produce the statistics reported here, we use a notebook with the following specification:

NVIDIA RTX 2070 Max Q (8GB)
Core i7-9750H CPU
230W AC Charger

We use Intel Realsense D435i Camera as our depth camera.

Reproducing This Repo

For hardware, you need a Realsense camera and a notebook with discrete GPU. For Realsense camera, we recommend using Realsense Camera D400 Series.

Our repo mainly use the following libraries:

Intel RealSense SDK 2.0
pyrealsense
PyTorch (1.2.0)
opencv (3.4.2)
PyQt5
numpy
matplotlib

Pipeline Diagram

The pipeline of our system can be summarized by the diagram below. It broken down into the following steps:

Our system seamlessly receives RGB image and its depth image. We interfaces with D435i camera with pyrealsense2.
After image proprocessing, we feed the RGB image into segmentation module. The segmentation module is developed in PyTorch.
Our interface consolidate the above output and feed them to scene understanding module and obstacle avoidance module. Additionally, the obstacle avoidance module requires depth image as input. Our interface developed in PyQt5.

Segmentation Module

Our segmentation module is based on semantic-segmentation-pytorch. While the base model is fast enough, we made a number of modifications on the image preprocessing and postprocessing step to make the speed faster.

The original model consists 150 classes. To smoothen the segmentation result, we group them into 8 general classes. The figure below illustrates the effectiveness of class grouping.

(First top to bottom: Original Images, Segmentation before Class Grouping, after Class Grouping)

Scene Understanding Module

To better help the visually blind user make sense of a scene, this module aims to translate and condense the visual information encoded in segmentation result.

It divides the segmentation result into 6 equal grids and summarise the objects present for each grid. To remove noise, objects with little occupation in a grid are omitted.

(Left: Illustrating how segmentation result are divided. Right: Summary for each grid)

Obstacle Avoidance Module

This module informs the users if they are facing any close obstacles. A red light is on if any close obstacle is detected. Otherwise, green light is on.

(First Figure: No close obstacle detected by the module. Second Figure: Close obstacle detected with its class and distance)

Demonstration

The demo shows the fast frame rate and the segmentation stability of our system. It also shows how our obstacle avoidance module works. When any close obstacle is right close to the user, red light is on with the corresponding obstacle visualized.

(Short demo video for our system)

Corridor Experiment

To show case the real time capacity of our system, we also conducted an experiment in a narrow corridor (located in Run Run Shaw Building, The University of Hong Kong).

In the experiment, one of our teammates role played the blind and navigated in the corridor by solely relying on our system. Video can be found here.

(A snapshot of our corridor experiment, the full demo can be found on Google Drive)

Acknowledgement

This is my capstone project for my Master of Data Science degree in The University of Hong Kong. The project is jointly developed by Alex Lau, Guo Huimin and Xie Jun.

We would like to take this chance to thank our two supervisors, Professor Yin Guoshen for his generous support and Dr. Luo Ping for his guidance on segmentation modules.

Our segmentation module is mainly built on semantic-segmentation-pytorch. The work gives us a very strong baseline to make our system solid.

Remarks

Any contribution is welcome. For enquiry, you can contact Alex Lau (alexlauwh@gmail.com) for details.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
d435_camera		d435_camera
examples		examples
images		images
logs		logs
mobilenet_segment		mobilenet_segment
results		results
test		test
unit_test		unit_test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
interface.py		interface.py
layout.py		layout.py
main.py		main.py
model_utils.py		model_utils.py
obj_avoidance.py		obj_avoidance.py
profiler.py		profiler.py
pyqt_utils.py		pyqt_utils.py
scene_summary.py		scene_summary.py
testing.png		testing.png
thread_utils.py		thread_utils.py

License

riven314/DeepLearning-Navigation

Folders and files

Latest commit

History

Repository files navigation

About

Hardware Specification

Reproducing This Repo

Pipeline Diagram

Segmentation Module

Scene Understanding Module

Obstacle Avoidance Module

Demonstration

Corridor Experiment

Acknowledgement

Remarks

About

Topics

Resources

License

Stars

Watchers

Forks

Languages