Skip to content
This repository has been archived by the owner on Apr 17, 2021. It is now read-only.

mhiro2/kaggle-microsoft-malware-prediction

Repository files navigation

kaggle-microsoft-malware-prediction

855th place solution to Microsoft Malware Prediction Challenge.

  • Public LB: 9th (0.707)
  • Private LB: 855th (0.637) 😇😇😇

Prerequisite

  • Compress original csv files. (See also my dataset information)

    xz data/input/{train,test}.csv 
    
  • Create a new python project

    pipenv install
    
  • Pull Tensorflow image from NVIDIA GPU CLOUD (NGC)

    docker login nvcr.io
    docker image pull nvcr.io/nvidia/tensorflow:19.02-py3
    

Usage

Create dump files of the dataset

pipenv shell
python scripts/create_dump.py

LigthGBM part

pipenv shell

# run lightgbm (`--create-features` option only need to run the first time)
python run_lightgbm.py --config=configs/lgb_gbdt_seed1.yaml --create-features

Neural Network part

docker container run -it --name=tf --runtime=nvidia --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v $PWD:/workspace/malware nvcr.io/nvidia/tensorflow:19.02-py3
cd /workspace/malware

# default pandas version on this image is too old for me...
pip install -U pandas

# install some requirements
pip install pyarrow pyyaml

# train and predict NN model (`--create-dataset` option only need to run the first time)
python run_nn.py --config=configs/nn_seed12345.yaml --create-dataset

Emsemble

python ensemble.py

Author

Masaaki Hirotsu / Kaggle: @mhiro2

About

Microsoft Malware Prediction Challenge (855th place 😭)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages