PaddleVideo

Introduction

PaddleVideo is a toolset for video recognition, action localization, and spatio temporal action detection tasks prepared for the industry and academia. This repository provides examples and best practice guildelines for exploring deep learning algorithm in the scene of video area. We devote to support examples and utilities which can significantly reduce the "time to deploy". By the way, this is also a proficiency verification and implementation of the newest PaddlePaddle 2.0 in the video field.

Feature

Advanced model zoo design PaddleVideo unifies the video understanding tasks, including recogniztion, localization, spatio temporal action detection, and so on. with the clear configuration system, we design a decoupling modular framework which can easily construct a customized network by combining different modules.
Various dataset and architectures PaddleVideo supports more datasets and architectures, including Kinectics400, ucf101, YoutTube8M datasets, and video recognition models, such as TSN, TSM, SlowFast, AttentionLSTM and action localization model, like BMN.
Higher performance PaddleVideo has built-in solutions to improve accuracy on the recognition models. PPTSM, which is based on the standard TSM, already archive the best performance in the 2D recognition network, has the same size of parameters but improve the Top1 Acc to 73.5% . One can easily apply the soulutions on his own dataset.
Faster training strategy PaddleVideo suppors faster training strategy, it accelerates by 100% compared with the standard version, and it only takes 10 days to train from scratch on the kinetics400 dataset.
Deployable PaddleVideo is powered by the Paddle Inference. There is no need to convert the model to ONNX format when deploying it, all you want can be found in this repository.

Overview of the kit structures

Architectures

Frameworks

Components

Data Augmentation

Recognition

TSN
TSM
SlowFast
PPTSM
VideoTag
AttentionLSTM

Localization

BMN

Recognizer1D

Recognizer2D

Recognizer3D

Localizer

resnet
resnet_tsm
resnet_tweaks_tsm
bmn

tsm_head
tsn_head
bmn_head

Solver

Optimizer

Momentum
RMSProp

LearningRate

PiecewiseDecay

Loss

CrossEntropy
BMNLoss

Metrics

CenterCrop
MultiCrop

Batch

Mixup
Cutmix

Image

Resize
Flipping
MultiScaleCrop
Crop
Color Distort
Random Crop

Image

Mixup
Cutmix

Overview of the performance

The chart below illustrates the performance of the recognition models, including our implementation and pytorch version. It shows the relationship between Acc Top1 and VPS on the Kinectics400 dataset . (Tested on the Tesla V100.)

Note：

PPTSM improves 3.5% Top1 accuracy from standard TSM.
all these models described by RED color can be obtained in the Model Zoo, and others are Pytorch results.

Tutorials

Basic

Advanced

Model zoo

recognition Introduction
- Attention-LSTM
- TSN
- TSM
- PPTSM
- SlowFast
- VideoTag
Localization Introduction
- BMN
Spatio temporal action detection：
- Coming Soon!

License

PaddleVideo is released under the Apache 2.0 license.

Contributing

This poject welcomes contributions and suggestions. Please see our contribution guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
configs		configs
data		data
docs		docs
paddlevideo		paddlevideo
tools		tools
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.style.yapf		.style.yapf
LICENSE		LICENSE
README.md		README.md
README_cn.md		README_cn.md
main.py		main.py
requirements.txt		requirements.txt
run.sh		run.sh
run_pptsm.sh		run_pptsm.sh

License

qu-way/PaddleVideo

Folders and files

Latest commit

History

Repository files navigation

PaddleVideo

Introduction

Feature

Overview of the kit structures

Overview of the performance

Tutorials

Basic

Advanced

Model zoo

License

Contributing

About

Resources

License

Stars

Watchers

Forks

Languages