CoCLR: Self-supervised Co-Training for Video Representation Learning

This repository contains the implementation of:

InfoNCE (MoCo on videos)
UberNCE (supervised contrastive learning on videos)
CoCLR

Link:

[Project Page] [PDF] [Arxiv]

News

[2021.01.29] Upload both RGB and optical flow dataset for UCF101 (links).
[2021.01.11] Update our paper for NeurIPS2020 final version: corrected InfoNCE-RGB-linearProbe baseline result in Table1 from 52.3% (pretrained for 800 epochs, unnessary and unfair) to 46.8% (pretrained for 500 epochs, fair comparison). Thanks @liuhualin333 for pointing out.
[2020.12.08] Update instructions.
[2020.11.17] Upload pretrained weights for UCF101 experiments.
[2020.10.30] Update "draft" dataloader files, CoCLR code, evaluation code as requested by some researchers. Will check and add detailed instructions later.

Pretrain Instruction

InfoNCE pretrain on UCF101-RGB

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_nce.py --net s3d --model infonce --moco-k 2048 \
--dataset ucf101-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 300 --schedule 250 280 -j 16

InfoNCE pretrain on UCF101-Flow

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_nce.py --net s3d --model infonce --moco-k 2048 \
--dataset ucf101-f-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 300 --schedule 250 280 -j 16

CoCLR pretrain on UCF101 for one cycle

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 2048 \
--dataset ucf101-2stream-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 100 --schedule 80 --name_prefix Cycle1-FlowMining_ -j 8 \
--pretrain {rgb_infoNCE_checkpoint.pth.tar} {flow_infoNCE_checkpoint.pth.tar}

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 2048 --reverse \
--dataset ucf101-2stream-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 100 --schedule 80 --name_prefix Cycle1-RGBMining_ -j 8 \
--pretrain {flow_infoNCE_checkpoint.pth.tar} {rgb_cycle1_checkpoint.pth.tar}

InfoNCE pretrain on K400-RGB

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch \
--nproc_per_node=4 main_infonce.py --net s3d --model infonce --moco-k 16384 \
--dataset k400-2clip --lr 1e-3 --seq_len 32 --ds 1 --batch_size 32 \
--epochs 300 --schedule 250 280 -j 16

InfoNCE pretrain on K400-Flow

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch \
--nproc_per_node=4 teco_fb_main.py --net s3d --model infonce --moco-k 16384 \
--dataset k400-f-2clip --lr 1e-3 --seq_len 32 --ds 1 --batch_size 32 \
--epochs 300 --schedule 250 280 -j 16

CoCLR pretrain on K400 for one cycle

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 16384 \
--dataset k400-2stream-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 50 --schedule 40 --name_prefix Cycle1-FlowMining_ -j 8 \
--pretrain {rgb_infoNCE_checkpoint.pth.tar} {flow_infoNCE_checkpoint.pth.tar}

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 16384 --reverse \
--dataset k400-2stream-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 50 --schedule 40 --name_prefix Cycle1-RGBMining_ -j 8 \
--pretrain {flow_infoNCE_checkpoint.pth.tar} {rgb_cycle1_checkpoint.pth.tar}

Dataset

RGB for UCF101: [download] (tar file, 29GB, packed with lmdb)
TVL1 optical flow for UCF101: [download] (tar file, 20.5GB, packed with lmdb)
Note: I created these lmdb files with msgpack==0.6.2, when load them with msgpack>=1.0.0, you can do msgpack.loads(raw_data, raw=True)(issue#32)

Result

Finetune entire network for action classification on UCF101:

Pretrained Weights

Our models:

UCF101-RGB-CoCLR: [download] [NN@1=51.8 on UCF101-RGB]
UCF101-Flow-CoCLR: [download] [NN@1=48.4 on UCF101-Flow]

Baseline models:

UCF101-RGB-InfoNCE: [download] [NN@1=33.1 on UCF101-RGB]
UCF101-Flow-InfoNCE: [download] [NN@1=45.2 on UCF101-Flow]

Kinetics400-pretrained models comming soon.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
asset		asset
backbone		backbone
dataset		dataset
eval		eval
model		model
process_data/data		process_data/data
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main_coclr.py		main_coclr.py
main_nce.py		main_nce.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

asset

asset

backbone

backbone

dataset

dataset

eval

eval

model

model

process_data/data

process_data/data

utils

utils

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

main_coclr.py

main_coclr.py

main_nce.py

main_nce.py

Repository files navigation

CoCLR: Self-supervised Co-Training for Video Representation Learning

Link:

News

Pretrain Instruction

Dataset

Result

Pretrained Weights

About

Releases

Packages

Languages

License

HarukiYqM/CoCLR

Folders and files

Latest commit

History

Repository files navigation

CoCLR: Self-supervised Co-Training for Video Representation Learning

Link:

News

Pretrain Instruction

Dataset

Result

Pretrained Weights

About

Resources

License

Stars

Watchers

Forks

Languages