The inspiration for this project comes from ultralytics/yolov3 Thanks.
This project is a YOLOv3 object detection system. Development framework by PyTorch.
The goal of this implementation is to be simple, highly extensible, and easy to integrate into your own projects. This implementation is a work in progress -- new features are currently being implemented.
We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that's pretty swell. It's a little bigger than last time but more accurate. It's still fast though, don't worry. At 320x320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 mAP@50 in 51 ms on a Titan X, compared to 57.5 mAP@50 in 198 ms by RetinaNet, similar performance but 3.8x faster. As always, all the code is online at this https URL
$ git clone https://github.com/Lornatang/YOLOv3-PyTorch.git
$ cd YOLOv3-PyTorch/
$ pip3 install -r requirements.txt
$ cd weights/
$ bash download_weights.sh
$ cd data/
$ bash get_coco_dataset.sh
usage: train.py [-h] [--epochs EPOCHS] [--batch-size BATCH_SIZE] [--accumulate ACCUMULATE]
[--cfg CFG] [--data DATA] [--multi-scale] [--img-size IMG_SIZE [IMG_SIZE ...]]
[--rect] [--resume] [--nosave] [--notest] [--evolve] [--cache-images]
[--weights WEIGHTS] [--arc ARC] [--name NAME] [--device DEVICE] [--adam]
[--single-cls] [--var VAR]
- Example (COCO2014)
To train on COCO2014 using a Darknet-53 backend pretrained on ImageNet run:
$ python3 train.py --cfg cfgs/yolov3.cfg --data cfgs/coco2014.data --weights weights/darknet53.conv.74 --multi-scale
- Example (VOC2007+2012)
To train on VOC07+12:
$ python3 train.py --cfg cfgs/yolov3-voc.cfg --data cfgs/voc2007.data --weights weights/darknet53.conv.74 --multi-scale
- Other training methods
Normal Training: python3 train.py
to begin training after downloading COCO data with data/get_coco_dataset.sh
. Each epoch trains on 117,263 images from the train and validate COCO sets, and tests on 5000 images from the COCO validate set.
Resume Training: python3 train.py --resume
to resume training from weights/checkpoint.pth
.
- mAP@0.5 run at
--iou-threshold 0.5
, mAP@0.5...0.95 run at--iou-threshold 0.7
- Darknet results: https://arxiv.org/abs/1804.02767
Method | Size | COCO mAP @0.5...0.95 |
COCO mAP @0.5 |
---|---|---|---|
YOLOv3-tiny YOLOv3 YOLOv3-SPP |
320 | 14.0 28.7 30.5 |
29.1 51.8 52.3 |
YOLOv3-tiny YOLOv3 YOLOv3-SPP |
416 | 16.0 31.2 33.9 |
33.0 55.4 56.9 |
YOLOv3-tiny YOLOv3 YOLOv3-SPP |
512 | 16.6 32.7 35.6 |
34.9 57.7 59.5 |
YOLOv3-tiny YOLOv3 YOLOv3-SPP |
608 | 16.6 33.1 37.0 |
35.4 58.2 60.7 |
$ python3 test.py --cfg cfgs/yolov3-spp.cfg --weights weights/yolov3-spp.pth --augment --save-json --image-size 608
Namespace(augment=True, batch_size=16, cfg='cfgs/yolov3-spp.cfg', confidence_threshold=0.001, data='data/coco2014.data', device='', image_size=608, iou_threshold=0.6, save_json=True, single_cls=False, task='eval', weights='weights/yolov3-spp.pth', workers=4)
Using CUDA
+ device:0 (name='TITAN RTX', total_memory=24190MB)
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.454
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.644
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.497
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.270
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.504
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.577
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.363
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.599
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.668
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.502
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.724
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.805
detect.py
runs inference on any sources:
$ python3 detect.py --source ...
- Image:
--source file.jpg
- Video:
--source file.mp4
- Directory:
--source dir/
- Webcam:
--source 0
- HTTP stream:
--source https://v.qq.com/x/page/x30366izba3.html
To run a specific models:
YOLOv3: python3 detect.py --cfg cfgs/yolov3.cfg --weights weights/yolov3.weights
YOLOv3-tiny: python3 detect.py --cfg cfgs/yolov3-tiny.cfg --weights weights/yolov3-tiny.weights
YOLOv3-SPP: python3 detect.py --cfg cfgs/yolov3-spp.cfg --weights weights/yolov3-spp.weights
In addition to some architectures given by the author, we also add some commonly used neural network architectures, which usually have better mAP and less computation than the original architecture.
- All training and tests at image size:(416 x 416) for GeForce RTX 2080 Ti.
Note: All commands use the following parameters.
python3 train.py --cfg <cfg-path> --data cfgs/voc2007.data --multi-scale --cache-image --batch-size 8
Backbone | Train | Test | train time (s/iter) | inference time (ms/im) | train mem (GB) | mAP | Cfg | Weights |
---|---|---|---|---|---|---|---|---|
YOLOv3-tiny | VOC07+12 | VOC07 | 0.047 | 1.9 | 2.7 | 57.7 | Link | weights |
MobileNet-v1 | VOC07+12 | VOC07 | 0.056 | 2.4 | 2.9 | 65.2 | Link | weights |
MobileNet-v2 | VOC07+12 | VOC07 | 0.116 | 2.5 | 3.1 | 65.6 | Link | weights |
MobileNet-v3-small | VOC07+12 | VOC07 | 0.050 | 1.8 | 1.0 | 57.7 | Link | weights |
MobileNet-v3-large | VOC07+12 | VOC07 | 0.080 | 2.6 | 3.1 | 60.4 | Link | weights |
ShuffleNet-v1 | VOC07+12 | VOC07 | - | - | - | - | Link | - |
ShuffleNet-v2 | VOC07+12 | VOC07 | - | - | - | - | Link | - |
AlexNet | VOC07+12 | VOC07 | 0.065 | 2.5 | 1.5 | 55.2 | Link | weights |
VGG16 | VOC07+12 | VOC07 | 0.194 | 7.9 | 7.7 | 73.7 | Link | weights |
Run the commands below to create a custom model definition, replacing your-dataset-num-classes
with the number of classes in your dataset.
# move to configs dir
$ cd cfgs/
# create custom model 'yolov3-custom.cfg'. (In fact, it is OK to modify two lines of parameters, see `create_model.sh`)
$ bash create_model.sh your-dataset-num-classes
Add class names to data/custom/classes.names
. This file should have one row per class name.
Move the images of your dataset to data/custom/images/
.
Move your annotations to data/custom/labels/
. The dataloader expects that the annotation file corresponding to the image data/custom/images/train.jpg
has the path data/custom/labels/train.txt
. Each row in the annotation file should define one bounding box, using the syntax label_idx x_center y_center width height
. The coordinates should be scaled [0, 1]
, and the label_idx
should be zero-indexed and correspond to the row number of the class name in data/custom/classes.names
.
In data/custom/train.txt
and data/custom/valid.txt
, add paths to images that will be used as train and validation data respectively.
To train on the custom dataset run:
$ python3 train.py --cfg cfgs/yolov3-custom.cfg --data cfg/custom.data --epochs 100 --multi-scale
Add --weights weights/darknet53.conv.74
to train using a backend pretrained on ImageNet.
$ git clone https://github.com/Lornatang/YOLOv3-PyTorch && cd YOLOv3-PyTorch
# convert darknet cfgs/weights to pytorch model
$ python3 -c "from easydet.utils import convert; convert('cfgs/yolov3-spp.cfgs', 'weights/yolov3-spp.weights')"
Success: converted 'weights/yolov3-spp.weights' to 'converted.pth'
# convert cfgs/pytorch model to darknet weights
$ python3 -c "from easydet.utils import convert; convert('cfgs/yolov3-spp.cfgs', 'weights/yolov3-spp.pth')"
Success: converted 'weights/yolov3-spp.pth' to 'converted.weights'
Joseph Redmon, Ali Farhadi
Abstract
We present some updates to YOLO! We made a bunch
of little design changes to make it better. We also trained
this new network that’s pretty swell. It’s a little bigger than
last time but more accurate. It’s still fast though, don’t
worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP,
as accurate as SSD but three times faster. When we look
at the old .5 IOU mAP detection metric YOLOv3 is quite
good. It achieves 57.9 AP50 in 51 ms on a Titan X, compared
to 57.5 AP50 in 198 ms by RetinaNet, similar performance
but 3.8× faster. As always, all the code is online at
https://pjreddie.com/yolo/.
[Paper] [Project Webpage] [Authors' Implementation]
@article{yolov3,
title={YOLOv3: An Incremental Improvement},
author={Redmon, Joseph and Farhadi, Ali},
journal = {arXiv},
year={2018}
}