Automatic Pull Quote Selection

Learning to automatically select pull quotes (wikipedia).

This code accompanies the accepted COLING-2020 paper Catching Attention with Automatic Pull Quote Selection.

Requirements

This project is written in Python3.6.9

The following non-default libraries are used:

numpy 1.18.2
sklearn 0.22.2.post1
seaborn 0.9.0
matplotlib 3.1.2
scipy 1.4.1
keras 2.3.0
tensorflow 1.14.0
sumy 0.8.1
nltk 3.4.5
textstat 0.6.0
textblob 0.15.3
sentence_transformers 0.2.5

Preparing the dataset

To reproduce our dataset:

navigate to the datasets/url_lists/ directory and unzip url_lists.zip so that the 4 files are in datasets/url_lists/
nagivate to datasets/ and run python3.6 construct_dataset.py source my_save_dir/.
- source can be one of intercept, ottawa-citizen, cosmo, national-post, or all
- the samples for a given source will be stored in my_save_dir/source/
- ⚠️ Update settings.py so that base_pq_directory points to my_save_dir/.
- ⚠️ This will take a long time.
navigate to the root repo folder and run python3.6 calculate_data_stats.py to calculate dataset statistics to compare with our paper.

Reproducing experiments

To reproduce our experimental results, run bash run_experiments.sh (output will be stored in /results).

ℹ️ To first make sure that things work, run bash run_experiments.sh --quick. It should take just a few minutes.

Miscellaneous

To reproduce the handcrafted feature value distribution figures, run python3.6 view_feature_dists.py

To analyze test articles with a all models, run bash generate_model_samples.sh. The --quick argument can similarly be used to make sure things are working.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
datasets		datasets
models		models
other_experiments		other_experiments
results		results
utils		utils
LICENSE		LICENSE
README.md		README.md
calculate_data_stats.py		calculate_data_stats.py
experiments_cross_task.py		experiments_cross_task.py
experiments_handcrafted.py		experiments_handcrafted.py
experiments_ngrams.py		experiments_ngrams.py
experiments_progression.py		experiments_progression.py
experiments_sbert_dims.py		experiments_sbert_dims.py
generate_model_samples.sh		generate_model_samples.sh
run_experiments.sh		run_experiments.sh
sample_generation.py		sample_generation.py
settings.py		settings.py
survey_analysis.py		survey_analysis.py
survey_analysis.tsv		survey_analysis.tsv
view_feature_dists.py		view_feature_dists.py

License

tannerbohn/AutomaticPullQuoteSelection

Folders and files

Latest commit

History

Repository files navigation

Automatic Pull Quote Selection

Requirements

Preparing the dataset

Reproducing experiments

Miscellaneous

About

Resources

License

Stars

Watchers

Forks

Languages