This repository provides code to train and used neural network on compressed JPEG images. No pre-trained weights are/will be made available.
This implementation relies on the module jpeg2dct from uber research team. The SSD used in this repository was taken from this repository and then modified.
All the networks proposed in this repository are modified versions of the three following architectures
- Installation
- Training
- Prediction
- Classification (ImageNet)
- Detection (PascalVOC)
- Detection (MS-COCO)
- Method limitations
The provided code can be used directly or install as a package. The following steps are to install the dependencies in a virtual env:
# Making virtualenv
mkdir .venv
cd .venv
python3 -m venv jpeg_deep
source jpeg_deep/bin/activate
cd ..
# Installing all the dependencies (the code was tested with the specified version numbers on python 3.+)
pip install keras
pip install tensorflow-gpu==1.14.0
pip install pillow
pip install opencv-python
pip install jpeg2dct
pip install albumentations
pip install tqdm
pip install bs4
pip install cython
pip install pycocotools
pip install matplotlib
The training uses a system of configuration files and experiments. This system aims to help saving the parameters of a given run. On start of the training, an experiment folder will be created with copies of the configuration files, weights and logs. Example config files are available in the config folder. The config files defines all the training and testing parameters.
To simplify deployment on different machines, the following variables need to be defined (see the Classification/Detections sections for details in the dataset_path):
# Setting the main dirs for the training datasets
export DATASET_PATH_TRAIN=<path_to_train_directory>
export DATASET_PATH_VAL=<path_to_validation_directory>
export DATASET_PATH_TEST=<path_to_test_directory>
# Setting the directory were the experiment folder will be created
export EXPERIMENTS_OUTPUT_DIRECTORY=<path_to_output_directory>
Once you have defined all the variables and modified the config files to your needs, simply run the following command (you will need to update some of the parameters to when not using horovod):
python scripts/training.py -c <config_dir_path> --no-horovod
The config file in the <config_dir_path> needs to be named "config.py" for the script to run correctly.
For more details on classification training on ImageNet dataset, refer to this section, for more details for training on Pascal VOC dataset, refer to this section and for more details for training MS-COCO dataset, refer to this section
The training script support the usage of horovod. I highly recommend to train on multiple GPUs for the classification given the size of the dataset. An exemple file for training with horovod using slurm is provided jpeg_deep.sl.
cd slurm
sbatch jpeg_deep.sl
If you do not run on a multi-cluster computation facility that uses slurm, please refer to the original horovod git
No pre-trained weights are/will be made available. To get this section running, you'll have to retrain the networks from scratch.
Displaying the results can be done using the prediction.py script. In order to use the script you have to first carry a training for at least one epoch (the prediction pre-suppose that you have an experiment folder).
The prediction will be done on the test set. You need to modify the config_temp.py file in the experiment generated folder in order to use a different dataset.
For the vgg16 based classifiers: The prediction script uses the test generator specified in the config file to get the data. Hence, with the provided examples, you may need first to convert the weights to a fully convolutional version of the network. This can be done using the classification2ssd.py script.
Once this is done, simply run the following command:
python scripts/prediction.py <experiment_path> <weights_path>
We also provide with a way to test the speed of the trained networks. This is done using the prediction_time.py script.
In order to test the speed of the networks, a batch of data is preloaded into memory then prediction is run over this batch for P times, and the overall is done N times. Results is then the averaged time. You may or may not load weights.
python scripts/prediction_time.py <experiment_path> -nr 10 -w <weights_path>
The table below shows the results obtained (accuracy) compared with the state of the art. All the presented results are on the validation dataset. All the FPS were calculated using a NVIDIA GTX 1080 and using the prediction_time.py script. Batch size was set to 8.
Official Newtorks | top-1 | top-5 | FPS |
---|---|---|---|
VGG16 | 73.0 | 91.2 | N/A |
VGG-DCT | 42.0 | 66.9 | N/A |
ResNet50 | 75.78 | 92.65 | N/A |
LC-RFA | 75.92 | 92.81 | N/A |
LC-RFA-Thinner | 75.39 | 92.57 | N/A |
Deconvolution-RFA | 76.06 | 92.02 | N/A |
VGG based Newtorks (our trainings) | top-1 | top-5 | FPS |
---|---|---|---|
VGG16 | 71.9 | 90.8 | 267 |
VGG-DCT | 65.5 | 86.4 | 553 |
VGG-DCT Y | 62.6 | 84.6 | 583 |
VGG-DCT Deconvolution | 65.9 | 86.7 | 571 |
ResNet50 based Newtorks (our trainings) | top-1 | top-5 | FPS |
---|---|---|---|
ResNet50 | 74.73 | 92.33 | 324 |
LC-RFA | 74.82 | 92.58 | 318 |
LC-RFA Y | 73.25 | 91.40 | 329 |
LC-RFA-Thinner | 74.62 | 92.33 | 389 |
LC-RFA-Thinner Y | 72.48 | 91.04 | 395 |
Deconvolution-RFA | 74.55 | 92.39 | 313 |
The dataset can be downloaded here. Choose the version that suits your needs, I used the 2012 (Object Detection) data.
Once the data is downloaded, to use the provided generators, it should be stored following this tree (as long as you have separeted train and validation folders you should be okay)
imagenet
|
|_ train
| |_ n01440764
| |_ n01443537
| |_ ...
|
|_ validation
|_ n01440764
|_ n01443537
|_ ...
Then you'll just need to set the configuration files to fit your needs and follow the procedure described in the training section. Keep in mind that the provided configuration files were used in a distributed training, hence the hyper parameters fit this particular settings. If you don't train that way, you'll need to change them.
Also the system variable should be set to the ImageNet folder (if you use the provided config files)
# Setting the main dirs for the training datasets
export DATASET_PATH_TRAIN=<path_to_train_directory>/imagenet
export DATASET_PATH_VAL=<path_to_validation_directory>/imagenet
export DATASET_PATH_TEST=<path_to_test_directory>/imagenet
Results for training on the Pascal VOC dataset are presented bellow. Networks were either trained on the 2007 train/val set (07) or 2007+2012 train/val sets (07+12) and evaluated on the 2007 test set.
Official Networks | mAP (07) | mAP (07+12) | FPS |
---|---|---|---|
SSD300 | 68.0 | 74.3 | N/A |
SSD300 DCT | 39.2 | 47.8 | N/A |
Networks, VGG based (our trainings) | mAP (07) | mAP (07+12) | FPS |
---|---|---|---|
SSD300 | 65.0 | 74.0 | 102 |
SSD300 DCT | 48.9 | 60.0 | 262 |
SSD300 DCT Y | 50.7 | 59.8 | 278 |
SSD300 DCT Deconvolution | 38.4 | 53.5 | 282 |
Network, ResNet50 based (our trainings) | mAP (07) | mAP (07+12) | FPS |
---|---|---|---|
SSD300-Resnet50 (retrained) | 61.3 | 73.1 | 108 |
SSD300 DCT LC-RFA | 61.7 | 70.7 | 110 |
SSD300 DCT LC-RFA Y | 62.1 | 71.0 | 109 |
SSD300 DCT LC-RFA-Thinner | 58.5 | 67.5 | 176 |
SSD300 DCT LC-RFA-Thinner Y | 60.6 | 70.2 | 174 |
SSD300 DCT Deconvolution-RFA | 54.7 | 68.8 | 104 |
The data can be downloaded on the official website.
After downloading you should have directories following this architecture:
VOCdevkit
|
|_ VOC2007
| |_ Annotations
| |_ ImageSets
| |_ JPEGImages
| |_ ...
|
|_ VOC2012
|_ Annotations
|_ ImageSets
|_ JPEGImages
|_ ...
Then you'll just need to set the configuration files to fit your needs and follow the procedure described in the training section. The hyper-parameters provided for the training were not used in a parallel setting.
Also the system variable should be set to the Pascal VOC folder (if you use the provided config files)
# Setting the main dirs for the training datasets
export DATASET_PATH_TRAIN=<path_to_train_directory>/VOCdevkit
export DATASET_PATH_VAL=<path_to_validation_directory>/VOCdevkit
export DATASET_PATH_TEST=<path_to_test_directory>/VOCdevkit
I know from experience that diving into ones code to adapt to its own project is often hard and confusing at first. To help you if you ever want to toy with the code, a built-in documentation is provided. It uses a modify version of the keras documentation generator (here).
To generate the documentation:
pip install mkdocs
cd docs
python autogen.py
To display the documentation:
# From root of the repository
mkdocs serve
The presented method has some limitations especially for general purpose deployments. The two main issues I see are described hereafter.
Resizing images in the RGB domain is straightforward whereas resizing in the DCT domain is more complicated. Although theoretically doable, methods for such usage are not implemented. The following list of articles explore the possibility to resize images directly in the frequency domain:
- On Resizing Images In The DCT Domain
- Image Resizing In The Discrete Cosine Transform Domain
- Fast Image Resizing in Discrete Cosine Transform Domain with Spatial Relationship between DCT Block and its Sub-Blocks
- Design and Analysis of an Image Resizing Filter in the Block-DCT Domain
For classification, the impact is limited as long as the images are about the same size as the original training images. This is due to the fact that the network can be made fully convolutionnals. For detection, this is a bit more complicated as the SSD in the presented implementation does not scale well (although it should theoretically be able to do so). This is due to the original design of the network and the need for padding layers. I intend to test modified version of the network if I find some time to do so.
The second limitation is for training. Data-augmentation has to be carried in the RGB domain, thus the data-augmentation pipeline is the following one: JPEG => RGB => data-augmentation => JPEG => Compressed Input. This slows down the training.