Skip to content

YannSc/jpeg_deep

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neural Networks using compressed JPEG images.

This repository provides code to train and used neural network on compressed JPEG images. No pre-trained weights are/will be made available.

This implementation relies on the module jpeg2dct from uber research team. The SSD used in this repository was taken from this repository and then modified.

All the networks proposed in this repository are modified versions of the three following architectures

Summary

  1. Installation
  2. Training
  3. Prediction
  4. Classification (ImageNet)
  5. Detection (PascalVOC)
  6. Detection (MS-COCO)
  7. Method limitations

Installation

The provided code can be used directly or install as a package. The following steps are to install the dependencies in a virtual env:

# Making virtualenv
mkdir .venv
cd .venv
python3 -m venv jpeg_deep
source jpeg_deep/bin/activate

cd ..

# Installing all the dependencies (the code was tested with the specified version numbers on python 3.+)
pip install keras
pip install tensorflow-gpu==1.14.0
pip install pillow
pip install opencv-python
pip install jpeg2dct
pip install albumentations
pip install tqdm
pip install bs4
pip install cython
pip install pycocotools
pip install matplotlib

Training

The training uses a system of configuration files and experiments. This system aims to help saving the parameters of a given run. On start of the training, an experiment folder will be created with copies of the configuration files, weights and logs. Example config files are available in the config folder. The config files defines all the training and testing parameters.

System variables

To simplify deployment on different machines, the following variables need to be defined (see the Classification/Detections sections for details in the dataset_path):

# Setting the main dirs for the training datasets
export DATASET_PATH_TRAIN=<path_to_train_directory>
export DATASET_PATH_VAL=<path_to_validation_directory>
export DATASET_PATH_TEST=<path_to_test_directory>

# Setting the directory were the experiment folder will be created
export EXPERIMENTS_OUTPUT_DIRECTORY=<path_to_output_directory>

Starting the training

Once you have defined all the variables and modified the config files to your needs, simply run the following command (you will need to update some of the parameters to when not using horovod):

python scripts/training.py -c <config_dir_path> --no-horovod

The config file in the <config_dir_path> needs to be named "config.py" for the script to run correctly.

For more details on classification training on ImageNet dataset, refer to this section, for more details for training on Pascal VOC dataset, refer to this section and for more details for training MS-COCO dataset, refer to this section

Training using horovod

The training script support the usage of horovod. I highly recommend to train on multiple GPUs for the classification given the size of the dataset. An exemple file for training with horovod using slurm is provided jpeg_deep.sl.

cd slurm
sbatch jpeg_deep.sl

If you do not run on a multi-cluster computation facility that uses slurm, please refer to the original horovod git

Predict

No pre-trained weights are/will be made available. To get this section running, you'll have to retrain the networks from scratch.

Display the results

Displaying the results can be done using the prediction.py script. In order to use the script you have to first carry a training for at least one epoch (the prediction pre-suppose that you have an experiment folder).

The prediction will be done on the test set. You need to modify the config_temp.py file in the experiment generated folder in order to use a different dataset.

For the vgg16 based classifiers: The prediction script uses the test generator specified in the config file to get the data. Hence, with the provided examples, you may need first to convert the weights to a fully convolutional version of the network. This can be done using the classification2ssd.py script.

Once this is done, simply run the following command:

python scripts/prediction.py <experiment_path> <weights_path>

Prediction time

We also provide with a way to test the speed of the trained networks. This is done using the prediction_time.py script.

In order to test the speed of the networks, a batch of data is preloaded into memory then prediction is run over this batch for P times, and the overall is done N times. Results is then the averaged time. You may or may not load weights.

python scripts/prediction_time.py <experiment_path> -nr 10 -w <weights_path>

Classification (ImageNet)

Results on ImageNet

The table below shows the results obtained (accuracy) compared with the state of the art. All the presented results are on the validation dataset. All the FPS were calculated using a NVIDIA GTX 1080 and using the prediction_time.py script. Batch size was set to 8.

Official Newtorks top-1 top-5 FPS 
 VGG16 73.0 91.2 N/A
 VGG-DCT 42.0 66.9 N/A
 ResNet50 75.78 92.65 N/A
LC-RFA 75.92 92.81 N/A
LC-RFA-Thinner 75.39 92.57 N/A
Deconvolution-RFA 76.06 92.02 N/A
VGG based Newtorks (our trainings) top-1 top-5 FPS 
 VGG16 71.9 90.8 267
 VGG-DCT 65.5 86.4 553
 VGG-DCT Y 62.6 84.6 583
 VGG-DCT Deconvolution 65.9 86.7 571
ResNet50 based Newtorks (our trainings) top-1 top-5 FPS 
 ResNet50 74.73 92.33 324
LC-RFA 74.82 92.58 318
LC-RFA Y 73.25 91.40 329
LC-RFA-Thinner 74.62 92.33 389
LC-RFA-Thinner Y 72.48 91.04 395
Deconvolution-RFA 74.55 92.39 313

Training on ImageNet

The dataset can be downloaded here. Choose the version that suits your needs, I used the 2012 (Object Detection) data.

Once the data is downloaded, to use the provided generators, it should be stored following this tree (as long as you have separeted train and validation folders you should be okay)

imagenet
|
|_ train
|  |_ n01440764
|  |_ n01443537
|  |_ ...
|
|_ validation
   |_ n01440764
   |_ n01443537
   |_ ...

Then you'll just need to set the configuration files to fit your needs and follow the procedure described in the training section. Keep in mind that the provided configuration files were used in a distributed training, hence the hyper parameters fit this particular settings. If you don't train that way, you'll need to change them.

Also the system variable should be set to the ImageNet folder (if you use the provided config files)

# Setting the main dirs for the training datasets
export DATASET_PATH_TRAIN=<path_to_train_directory>/imagenet
export DATASET_PATH_VAL=<path_to_validation_directory>/imagenet
export DATASET_PATH_TEST=<path_to_test_directory>/imagenet

Detection (Pascal VOC)

Results on the PASCAL VOC dataset

Results for training on the Pascal VOC dataset are presented bellow. Networks were either trained on the 2007 train/val set (07) or 2007+2012 train/val sets (07+12) and evaluated on the 2007 test set.

 Official Networks  mAP (07)  mAP (07+12)  FPS
 SSD300  68.0  74.3  N/A
 SSD300 DCT  39.2  47.8  N/A
 Networks, VGG based (our trainings)  mAP (07)  mAP (07+12)  FPS
 SSD300  65.0  74.0  102
 SSD300 DCT  48.9  60.0  262
 SSD300 DCT Y  50.7  59.8  278
 SSD300 DCT Deconvolution  38.4  53.5  282
 Network, ResNet50 based (our trainings)  mAP (07)  mAP (07+12)  FPS
 SSD300-Resnet50 (retrained)  61.3  73.1  108
 SSD300 DCT LC-RFA  61.7  70.7  110
 SSD300 DCT LC-RFA Y  62.1  71.0  109
 SSD300 DCT LC-RFA-Thinner  58.5  67.5  176
 SSD300 DCT LC-RFA-Thinner Y  60.6  70.2  174
 SSD300 DCT Deconvolution-RFA  54.7  68.8  104

Training on the PASCAL VOC dataset

The data can be downloaded on the official website.

After downloading you should have directories following this architecture:

VOCdevkit
|
|_ VOC2007
|  |_ Annotations
|  |_ ImageSets
|  |_ JPEGImages
|  |_ ...
|
|_ VOC2012
   |_ Annotations
   |_ ImageSets
   |_ JPEGImages
   |_ ...

Then you'll just need to set the configuration files to fit your needs and follow the procedure described in the training section. The hyper-parameters provided for the training were not used in a parallel setting.

Also the system variable should be set to the Pascal VOC folder (if you use the provided config files)

# Setting the main dirs for the training datasets
export DATASET_PATH_TRAIN=<path_to_train_directory>/VOCdevkit
export DATASET_PATH_VAL=<path_to_validation_directory>/VOCdevkit
export DATASET_PATH_TEST=<path_to_test_directory>/VOCdevkit

Detection (MS-COCO)

Details in the dataset path

Running the documentation for a deeper usage of the provided code

I know from experience that diving into ones code to adapt to its own project is often hard and confusing at first. To help you if you ever want to toy with the code, a built-in documentation is provided. It uses a modify version of the keras documentation generator (here).

To generate the documentation:

pip install mkdocs

cd docs

python autogen.py

To display the documentation:

# From root of the repository
mkdocs serve

Method limitations

The presented method has some limitations especially for general purpose deployments. The two main issues I see are described hereafter.

Image Resizing

Resizing images in the RGB domain is straightforward whereas resizing in the DCT domain is more complicated. Although theoretically doable, methods for such usage are not implemented. The following list of articles explore the possibility to resize images directly in the frequency domain:

For classification, the impact is limited as long as the images are about the same size as the original training images. This is due to the fact that the network can be made fully convolutionnals. For detection, this is a bit more complicated as the SSD in the presented implementation does not scale well (although it should theoretically be able to do so). This is due to the original design of the network and the need for padding layers. I intend to test modified version of the network if I find some time to do so.

Training Pipeline

The second limitation is for training. Data-augmentation has to be carried in the RGB domain, thus the data-augmentation pipeline is the following one: JPEG => RGB => data-augmentation => JPEG => Compressed Input. This slows down the training.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 91.2%
  • Jupyter Notebook 8.7%
  • Shell 0.1%