855th place solution to Microsoft Malware Prediction Challenge.
- Public LB: 9th (0.707)
- Private LB: 855th (0.637) 😇😇😇
-
Compress original csv files. (See also my dataset information)
xz data/input/{train,test}.csv
-
Create a new python project
pipenv install
-
Pull Tensorflow image from NVIDIA GPU CLOUD (NGC)
docker login nvcr.io docker image pull nvcr.io/nvidia/tensorflow:19.02-py3
pipenv shell
python scripts/create_dump.py
pipenv shell
# run lightgbm (`--create-features` option only need to run the first time)
python run_lightgbm.py --config=configs/lgb_gbdt_seed1.yaml --create-features
docker container run -it --name=tf --runtime=nvidia --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v $PWD:/workspace/malware nvcr.io/nvidia/tensorflow:19.02-py3
cd /workspace/malware
# default pandas version on this image is too old for me...
pip install -U pandas
# install some requirements
pip install pyarrow pyyaml
# train and predict NN model (`--create-dataset` option only need to run the first time)
python run_nn.py --config=configs/nn_seed12345.yaml --create-dataset
python ensemble.py
Masaaki Hirotsu / Kaggle: @mhiro2