GitHub - xue-smile/feature-importance

To save training time, all models used in the three different datasets are provided in the following path /data/<dataset_name>/models e.g., /data/deception/models. BERT parameters should be stored in the following path data/<dataset_name>/bert_fine_tune. Please download the folders from this link: https://tinyurl.com/bert-fine-tune-folder. Note that folders can be huge and may take time to download.

1. Generate top 10 features and their feature importance for `svm`, `svm_l1`, `xgb`, and `lstm`.

To save svm, svm_l1, xgb, and lstm features and their feature importance, run save_combinations.py.
- Note that only save_combinations.py uses the downloaded shap package instead of the local one. As such, before running save_combinations.py, remember to set package path to run the downloaded shap package. Otherwise, simply rename local shap folder to something else so that save_combinations.py does not read from the local package. If you renamed local shap folder, remember to revert to the original folder name after running save_combinations.py so other files will not be affected.
To save lstm attention weights, run get_lstm_att_weights.py.
To save lstm SHAP, run python get_lstm_shap.py <dataset_name>.

2. Generate top 10 features and their feature importance for `bert`.

Generate tsv files for bert:
1. deception: run python data_retrieval.py deception
2. yelp: run python data_retrieval.py yelp
3. sst: run python data_retrieval.py sst
To save bert attention weights:
1. deception: run python bert_att_weight_retrieval.py --data_dir data/deception --bert_model data/deception/bert_fine_tune/ --task_name sst-2 --output_dir /data/temp_output_dir/deception/ --do_eval --max_seq_length 300 --eval_batch_size 1
2. yelp: run the above command, but replace deception with yelp, and change max_seq_length to 512
3. sst: run the above command, but replace deception with sst, and change max_seq_length to 128
To save bert LIME:
1. deception: run python bert_lime.py --data_dir data/deception --bert_model data/deception/bert_fine_tune/ --task_name sst-2 --output_dir /data/temp_output_dir/deception/ --do_eval --max_seq_length 300 --eval_batch_size 1
2. yelp: run the above command, but replace deception with yelp, and change max_seq_length to 512
3. sst: run the above command, but replace deception with sst, and change max_seq_length to 128
To save bert SHAP:
1. deception: run python bert_shap.py --data_dir data/deception --bert_model data/deception/bert_fine_tune/ --task_name sst-2 --output_dir /data/temp_output_dir/deception/ --do_eval --max_seq_length 300 --eval_batch_size 1
2. yelp: run the above command, but replace deception with yelp, and change max_seq_length to 512
3. sst: run the above command, but replace deception with sst, and change max_seq_length to 128
Generate bert spans and white spans:
1. deception: run python tokenizer_alignment.py --data_dir data/deception --bert_model data/deception/bert_fine_tune --task_name sst-2 --output_dir /data/temp_output_dir/deception/ --do_eval --max_seq_length 300
2. yelp: run the above command, but replace deception with yelp, and change max_seq_length to 512
3. sst: run the above command, but replace deception with sst, and change max_seq_length to 128
Align all bert features/tokens with correct weights, run python get_bert.py. Note: to generate bert related feature and its feature importance, it is important to follow the above steps in order.

3. Recreate analysis plots.

To generate plots in the paper, refer to interactive notebook main.ipynb.

If met with any problems, please send an email to vivian.lai@colorado.edu and jon.z.cai@colorado.edu.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
lime		lime
pytorch_pretrained_bert		pytorch_pretrained_bert
shap		shap
.gitignore		.gitignore
README.md		README.md
analysis.py		analysis.py
bert_att_weight_retrieval.py		bert_att_weight_retrieval.py
bert_lime.py		bert_lime.py
bert_shap.py		bert_shap.py
data_processors.py		data_processors.py
data_retrieval.py		data_retrieval.py
deep_id_pytorch.py		deep_id_pytorch.py
distribution.py		distribution.py
get_bert.py		get_bert.py
get_lstm_att_weights.py		get_lstm_att_weights.py
get_lstm_shap.py		get_lstm_shap.py
heterogeneity.py		heterogeneity.py
lstm.py		lstm.py
main.ipynb		main.ipynb
requirements.txt		requirements.txt
save_combinations.py		save_combinations.py
scikit_classification.py		scikit_classification.py
similarity.py		similarity.py
tokenizer_alignment.py		tokenizer_alignment.py
train_utils.py		train_utils.py
utils.py		utils.py

xue-smile/feature-importance

Folders and files

Latest commit

History

Repository files navigation

1. Generate top 10 features and their feature importance for svm, svm_l1, xgb, and lstm.

2. Generate top 10 features and their feature importance for bert.

3. Recreate analysis plots.

About

Resources

Stars

Watchers

Forks

Languages

1. Generate top 10 features and their feature importance for `svm`, `svm_l1`, `xgb`, and `lstm`.

2. Generate top 10 features and their feature importance for `bert`.