Skip to content

ywu94/Tencent-Ads-Algo-Comp-2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tencent-Ads-Algo-Comp-2020

MIT license

Git repo for Tencent Advertisement Algorithm Competition


Quick Start

cd ./Script
. prerequisite.sh
python3 input_generate.py
python3 input_split.py fine
python3 train_w2v.py creative 128
python3 train_w2v.py ad 128
python3 train_w2v.py advertiser 128
python3 train_w2v.py product 128
python3 train_w2v.py industry 64
python3 train_w2v.py product_category 64
python3 train_v2_age_final_resgru_multiInp.py 40 1024 100 1e-3

Script Documentation

Model Training V2

  • How to run training script

    Syntax: python3 train_v2_{some script name}.py 40 2048 100 1e-3

    Argument:

    1. (Required,INT) target epoch to train
    2. (Required,INT) batch size for training
    3. (Required,INT) maximal length of input sequence, smaller length can help train withb larger batch size
    4. (Required,FLOAT) learning rate for adam optimizer
    5. (Optional, INT) If nothing specified then the model will be trained from scratch, otherwise it indicates the epoch to resume training
    6. (Optional, INT) If nothing specified then the model will be trained from scratch, otherwise it indicates the training file to resume training
      • Example: 9, 2 indicates resume training from epoch 9 file 2.
  • Training script inventory

    |--Script
      |--data_loader_v2.py
      |
      |--clf_lstm.py             # Model based on stacked LSTM
      |--clf_gnmt.py             # Model based on GNMT (Google Neural Translation Machine)
      |--clf_tf_enc.py           # Model based on Encoder part of Transformer
      |--clf_esim.py             # Model based on ESIM (Enhanced Sequential Inference Model)
      |--clf_pre_ln_tf.py        # Model based on pre Layer Normalization Transformer
      |--clf_rcnn.py             # Model based on RCNN
      |--clf_final.py            # Model for final submission
      |
      |--train_v2_age_final_resgru_multiInp.py
      |--train_v2_age_final_resgru_cnn_multiInp.py
      |--train_v2_age_final_preln_tf_multiInp.py
    

Legacy - Model Training V1

  • How to run training script

    Syntax: python3 train_{some script name}.py 0 10 256 100 1e-3 split

    Argument:

    1. (Required,INT) 0 means training from scratch and a positive number means loading the corresponding epoch and start training from there
    2. (Required,INT) number of epoches to train
    3. (Required,INT) batch size for training
    4. (Required,INT) maximal length of input sequence, smaller length can help train withb larger batch size
    5. (Required,FLOAT) learning rate for adam optimizer
    6. (Optional) If nothing specified then the model will be trained using unsplitted files. If python3 input_split.py fine has been executed and a value is specified the model will be trained using a list of splitted files.
  • Training script inventory

    |--Script
      |--data_loader.py
      |
      |--multi_seq_lstm_classifier.py
      |--train_age_multi_seq_lstm_classifier.py
      |--train_gender_multi_seq_lstm_classifier.py
      |
      |--transformer_encoder_classifier.py
      |--train_age_transformer_encoder_classifier_with_creative.py
      |
      |--GNMT_classifier.py
      |--train_age_GNMT_classifier_with_creative.py
      |
      |--multi_seq_GNMT_classifier.py
      |--train_age_multi_seq_GNMT_classifier.py
    

Data Preparation

  • Step 1: run
cd ./Script
. prerequisite.sh

Note that if the instance has no public internet connection, download train file and test file and put them under /Script. You should have the following files and directories after execution.

|--Script
  |--train_artifact
    |--user.csv
    |--click_log.csv
    |--ad.csv
  |--test_artifact
    |--click_log.csv
    |--ad.csv
  |--input_artifact
  |--embed_artifact
  |--model_artifact
  |--output_artifact
  • Step 2: run
python3 input_generate.py
python3 input_split.py

For machine with small memory please replace the second line with python3 input_split.py fine.You should have the following files after execution.

|--Script
  |--input_artifact
    |--train_idx_shuffle.npy
    |--train_age.npy
    |--train_gender.npy
    |--train_creative_id_seq.pkl
    |--train_ad_id_seq.pkl
    |--train_advertiser_id_seq.pkl
    |--train_product_id_seq.pkl
    |--test_idx_shuffle.npy
    |--test_creative_id_seq.pkl
    |--test_ad_id_seq.pkl
    |--test_advertiser_id_seq.pkl
    |--test_product_id_seq.pkl
  |--embed_artifact
    |--embed_train_creative_id_seq.pkl
    |--embed_train_ad_id_seq.pkl
    |--embed_train_advertiser_id_seq.pkl
    |--embed_train_product_id_seq.pkl
  |--model_artifact
  |--output_artifact
  |--train_artifact
  |--test_artifact
  • Step 3: run
python3 train_w2v.py creative 128
python3 train_w2v.py ad 128
python3 train_w2v.py advertiser 128
python3 train_w2v.py product 128
python3 train_w2v.py industry 64
python3 train_w2v.py product_category 64

You should have the following files after exection.

|--Script
  |--embed_artifact
    |--w2v_registry.json
    |--wv_registry.json
    |--creative_sg_embed_s256_{random token}
    |--...
  |--model_artifact
  |--input_artifact
  |--output_artifact
  |--train_artifact
  |--test_artifact

Note that w2v_registry.json stores all the w2v model artifact paths and wv_registry.json stores all the KeyedVector artifact paths.

About

Tencent Advertisement Algorithm Competition 2020 / 2020腾讯广告算法大赛

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published