This repo holds the code for my master thesis in Artificial Intelligence at the University of Amsterdam. The paper is made publicly available here.
In this paper, we investigate 3 challenges in video processing:high computational cost,low data availability, and high intra-class variability, proposing ways of mitigating each of them. First, to mitigate computational cost, we propose an efficient sequence model, the Time-Aligned ResNet, based on the Time-Aligned DenseNet, that grows linearly with frame sampling frequency, achieving significant performance gains compared to its predecessor. Second, we seek to mitigate the problem of low data availability by designing multi-task models and training routines to extract richer information from existing data, although we do not arrive at a formulation that out-performs vanilla classification models. Finally, we try to mitigate the problem of intra-class variations by proposing a class of stochastic models. While we observe some improvement in generalisation power, these are not substantial enough to out-weigh the increase in computational cost.
- Ubuntu 18.04 or similar
- Cuda 10.1
- Python 3.7
- Miniconda 4.8.1
Some other linux libraries might be needed for open-cv, like libjpeg, or libpng, depending on your Linux distribution. Install on a per-need basis.
Create a conda environment from the environment.yml
file.
conda install -f env.yml
conda activate mt
Set 2 environment variables:
- the path to the source code
- the path where experiment assets will be stored.
export ML_SOURCEDIR="path/to/sourcecode"
exprot ML_WORKDIR="path/to/experiments"
Experiments are run on 2 datasets: Human Motion Database, and Something-Something-v2. You need to download the the .tar
files from the links below and place them under the specified path relative to ML_WORKDIR
.
- Human Motion Database -
./data/hmdb
- Something-Something-v2 -
./data/smth
Several pre-processing steps need to be run before training. For the Human Motion Database data set, run the following setup all necessary folders, extract videos from .tar files, split data, extract .jpeg frames from videos, etc.
python main.py setup --opts=set:hmdb
python main.py prepro_set --opts=set:hmdb,split:1,jpeg:yes
For Something-Something-v2, run the following to do all of the above but also select a subset of the data.
python main.py setup --opts=set:smth
python main.py select_subset --opts=set:smth,num_classes:51
python main.py prepro_set --opts=set:smth,split:1,jpeg:yes
Everything should be set now. You can run each of the experiments via the scripts under ./experiments
. This will run a
training routine, and an evaluation routine at the end.
sh ./experiments/experiment_1.sh
sh ./experiments/experiment_2.sh
sh ./experiments/experiment_3.sh
TensorBoard Logs will be saved to ${MT_WORKDIR}/runs
. You can view them as follows.
tensorboard --logdir=${MT_WORKDIR}/runs --bind_all
A visualiser tool is provided for each class of models. Check out visualise.py
for usage examples.