Skip to content

tannerbohn/AutomaticPullQuoteSelection

Repository files navigation

Automatic Pull Quote Selection

Learning to automatically select pull quotes (wikipedia).

This code accompanies the accepted COLING-2020 paper Catching Attention with Automatic Pull Quote Selection.

Requirements

This project is written in Python3.6.9

The following non-default libraries are used:

  • numpy 1.18.2
  • sklearn 0.22.2.post1
  • seaborn 0.9.0
  • matplotlib 3.1.2
  • scipy 1.4.1
  • keras 2.3.0
  • tensorflow 1.14.0
  • sumy 0.8.1
  • nltk 3.4.5
  • textstat 0.6.0
  • textblob 0.15.3
  • sentence_transformers 0.2.5

Preparing the dataset

To reproduce our dataset:

  1. navigate to the datasets/url_lists/ directory and unzip url_lists.zip so that the 4 files are in datasets/url_lists/
  2. nagivate to datasets/ and run python3.6 construct_dataset.py source my_save_dir/.
    • source can be one of intercept, ottawa-citizen, cosmo, national-post, or all
    • the samples for a given source will be stored in my_save_dir/source/
    • ⚠️ Update settings.py so that base_pq_directory points to my_save_dir/.
    • ⚠️ This will take a long time.
  3. navigate to the root repo folder and run python3.6 calculate_data_stats.py to calculate dataset statistics to compare with our paper.

Reproducing experiments

To reproduce our experimental results, run bash run_experiments.sh (output will be stored in /results).

ℹ️ To first make sure that things work, run bash run_experiments.sh --quick. It should take just a few minutes.

Miscellaneous

To reproduce the handcrafted feature value distribution figures, run python3.6 view_feature_dists.py

To analyze test articles with a all models, run bash generate_model_samples.sh. The --quick argument can similarly be used to make sure things are working.

About

Learning to automatically select pull quotes.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published