It's a monodepth2 model implemented in TF2.x, original paper《Digging into Self-Supervised Monocular Depth Prediction》.
- Here is the result after training for 13 epochs (without fine-tuning).
tensorflow==2.3.1
(for gpu) cudatoolkit=10.1, cudnn=7.6.5
I'm using a normal GTX1060-MaxQ GPU. The FPS for single-image depth estimation:
- using
tf.saved_model.load
(I think it's serving mode)- encoder: ~2ms (500 FPS)
- decoder: ~2ms
- overall: >200 FPS (but when I use with YOLOv4, it drops to ~120 FPS)
- using
tf.keras.models.load_model
withmodel.predict()
:- overall : ~100 FPS (details forgot...)
Use @tf.function
decorator for Train.grad()
and DataPprocessor.prepare_batch()
will allow much larger batch_size
# Start training (from scratch):
python Start.py --run_mode train --from_scratch True --data_path <path_to_kiiti>
# Continue training:
python Start.py --run_mode train --weights_dir <weights_folder_path> --data_path <path_to_kiiti>
Check intermediate result:
- method 1: run training code
# remember to comment "tf.function" decorator for Train.grad() and DataPprocessor.prepare_batch()
python train.py --weights_dir <folder_path> --debug_mode True --data_path <path_to_kiiti>
- method 2: run simple_run.py (recommanded, details see Notes below)
python simple_run.py --weights_dir --data_path --save_result_to --save_concat_image
The models (transferred from official Pytorch model, trained on KITTI-Odometry dataset, size (640x192).):
Note:It's trained on Odometry split, so if applied on Raw data, the results won't be perfect.
- Depth and Pose encoder-decoder
- TF2 data loader for KITTI dataset (KITTI_Raw and KITTI_Odom)
- training code
- evluation code for depth and pose
- see where to improve: pose net can use
pose = mean(inv(pose), pose)
to put constraint on predicted poses on two frames - Add new stuff from Unsupervised Monocular Depth Learning in Dynamic Scenes and Depth from Videos in the Wild
- implement Motion-Field model, to replace (rot, trans) with (rot, trans, trans_residual, intrinsics) for more information
- construct corresponding losses: rgbd and motion consistency losses
First, Evaluation code, i.e. eval_depth.py
and eval_pose.py
is finished.
Then simple_run.py
also finished, try it with:
python simple_run.py --weights_dir --data_path --save_result_to --save_concat_image
- data_path: path to a video or image file
- weights_dir: path to a folder with necessary weights (.h5)
- save_result_to: optional, a folder path where the result will be saved
- save_concat_image: optional, show concatenated images with original image for comparison
Next move:
- try new stuff in similar papers, e.g. struct2dpeth
Now you can train your own model using the train.py
and new_trainer.py
. For now I just trained for 1 epoch, and the results seem to head to the correct way. See the reconstructed image and the disp image under assets/first_epoch_res.jpg.
Next step will be:
- completing the evaluatiion code.
- try new stuff in similar papers, e.g. struct2dpeth to improve the model
Just for personal use, but please feel free to contact me if you need anything.
I haven't used argument-parsing, so you can't run with one command. You need to change some path settings when you run the demo. However, no worries, it's just a simple code merely for singlet depth estimation (for now). The Pose networks and training pipeline is still in progress, needs double check. Though bug-free, I can't give any guarantees for total correctness. Anyways, Take one minute you will know what's going on in there.
simple_run.py
: as the name suggests, it's a simple run, important functions are all there, no encapsulement.
depth_esitmator_demo.py
: a short demo, where the model is encapsuled in Classes. But I recommand to see simple_run.py
first.
The depth_decoder_creater.py
and encoder_creator.py
is used to
- Useful part: build the Model in TF2.x the same way as the official monodepth2 implemented in Pytroch.
- Neglectable part: weights loading. Weights were extracted from the official torch model in
numpy.ndarray
form, then directly loaded to the TF model layer-/parameter-wise. It's trivial. But I will upload the convertedSavedModel
directly, so no worries.
- the Official repo: https://github.com/nianticlabs/monodepth2
- a TF1.x version implementation (complete with training and evaluation): https://github.com/FangGet/tf-monodepth2