Kaggle Cloud Segmentation

Competitions: Understanding Clouds from Satellite Images
Rank: 72/1538 (Top 4.6%, silver) Task: Image multi-class segmentation
Data: trainset:5546 2100*1400 cloud images, testset:3698 2100*1400 cloud images
Note: This is a code backup, it's not runable due to the difference file path

Solution

数据增强
- 综合显存限制和实验效果，最终选择resize到480*640
- 随机4方向翻转(HorizontalFlip, VerticalFlip)
- 随机平移旋转(ShiftScaleRotate)
- 模糊(Blur)
- 形变(GridDistortion)
交叉验证
- 选择5fold交叉验证，减小过拟合风险
- 所有步骤，哪怕不同的stage，也要严格遵守同一份cv，避免泄露
模型
- segmentation_models.pytorch
- 模型结构采用了Unet和FPN (FPN效果稍好)
- Decoder尽可能多的选择不同与训练模型(resnet, densenet, efficientnet etc.)
- 增加classify模型，预测图片中是否含有某种云层，若不含，直接忽略segment的输出
Loss
- FocalLoss2d + DiceLoss
后处理
- 删除面积较小的mask
- 将mask转换成规则的多边形(因为训练数据中的mask比较规则)
- 阈值搜索(3个阈值-classify, segment, min_mask_area)
模型融合：
- stacking with TTA
- 直接取平均
- 直接取max
- 取均值和max的均值（效果最好）

TODO

尝试更多的模型结构 DeepLab, SegNet etc.
因为在Dice评判标准中FP和FN的影响程度是不一样的, 高FP带来的效果比FN糟糕，所以在像素点模棱两可的情况下宁可将其判断成正类也优于负类
- Weighted Loss：BCELoss或 FocalLoss中赋予不同的判错权重
- 更合理的模型融合方法：取均值太保守，取max太激进

File Discribe

-------- data_preprocess.py: preprocess with resize, 缩短训练时间
  |
  |----- dataset.py: override Pytorch DataSet
  |
  |----- global_parameter.py: global parameters
  |
  |----- generate_kfold.py: generate kfold.pkl
  |
  |----- loss.py: FocalLoss2d & DiceLoss
  |
  |----- models.py: classify models
  |
  |----- train_cls_multiGPU.py: train classify models
  |
  |----- strain_seg_multiGPU.py: train segment models
  |
  |----- vi_cls_tta.py: validate & inference classify models with tta
  |
  |----- vi_seg_tta.py: validate & inference segment models with tta
  |
  |----- train_*_onefold.py: train models in one fold (use for fintuning)
  |
   ----- utils.py: utils functions

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
1st_solution.png		1st_solution.png
README.md		README.md
data_preprocess.py		data_preprocess.py
dataset.py		dataset.py
ensemble_cls.py		ensemble_cls.py
ensemble_cls_max.py		ensemble_cls_max.py
ensemble_seg_max.py		ensemble_seg_max.py
ensemble_seg_mean.py		ensemble_seg_mean.py
fintune_singleGPU.py		fintune_singleGPU.py
generate_kfold.py		generate_kfold.py
get_tg.py		get_tg.py
global_parameter.py		global_parameter.py
loss.py		loss.py
models.py		models.py
show_loss.py		show_loss.py
submission_ensemble.py		submission_ensemble.py
submission_model.py		submission_model.py
train_cls_multiGPU.py		train_cls_multiGPU.py
train_cls_onefold.py		train_cls_onefold.py
train_seg_multiGPU.py		train_seg_multiGPU.py
train_seg_onefold.py		train_seg_onefold.py
utils.py		utils.py
vi_cls_tta.py		vi_cls_tta.py
vi_seg_tta.py		vi_seg_tta.py

Kyle1993/Kaggle-Cloud-Segment

Folders and files

Latest commit

History

Repository files navigation

Kaggle Cloud Segmentation

Solution

TODO

File Discribe

Top Rank Solution

About

Resources

Stars

Watchers

Forks

Languages