Modeling citation worthiness by using attention-based bidirectional long short-term memory networks and interpretable models

This is the repository contains the code and data for the Scientometrics paper:

Zeng, T., Acuna, D.E. (2020), Modeling citation worthiness by using attention-based Bidirectional Long Short-Term Memory networks and interpretable models, Scientometrics, 124(1), 399–428

Data

ACL‑ARC dataset: please refer to Bonab et al., 2018 for details. We downloaded a copy of the dataset, adjusted some fields. You can download it from Figshare: 10.6084/m9.figshare.12573872.

PMOA-CITE dataset: please download 1M sentences from Figshare: 10.6084/m9.figshare.12547574

PMOA-CITE and ACL-ARC combined: please download it from Figshare: 10.6084/m9.figshare.12573974

Dependencies

The code requires the following packages:

allennlp==0.9.0
scikit-learn==0.21.2

Run the experiments

All the experiments configuration files are located in cite-worthiness/experiments folder, to run an experiment:

Please find the fields train_data_path, validation_data_path and test_data_path in each jsonnet file, and change the value to the path where you store the datasets mentioned above.
Find the cuda_device field, change it to -1 if you're using a CPU, otherwise the CUDA device number.
Run the command:

allennlp train /path/to/experiment/configuration/jsonnet/file -s ../path/to/serialization/dir/  --include-package citation_worthiness

Please refer to allennlp documentation for the use of train command

Live demo for citation worthiness

Please visit our live demo at https://cite-worthiness.scienceofscience.org/, just input some sentences, the tool will predict the probabilities of needing a citation.

How to cite

If you use the dataset and code on this repo, please cite our work: Modeling citation worthiness by using attention-based bidirectional long short-term memory networks and interpretable models.

@Article{Zeng2020,
    author={Zeng, Tong and Acuna, Daniel E.},
    title={Modeling citation worthiness by using attention-based bidirectional long short-term memory networks and interpretable models},
    journal={Scientometrics},
    year={2020},
    month={Jul},
    day={01},
    volume={124},
    number={1},
    pages={399-428},
    issn={1588-2861},
    publisher = {Springer International Publishing},
    doi={10.1007/s11192-020-03421-9},
    url={https://doi.org/10.1007/s11192-020-03421-9}
}

Science of Science and Computational Discovery Lab

The datasets and code are developed in the Science of Science and Computational Discovery Lab in the School of Information Studies, Syracuse University.

License

The code in this repo uses the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
citation_worthiness		citation_worthiness
experiments		experiments
.gitignore		.gitignore
LICENSE.TXT		LICENSE.TXT
README.md		README.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

citation_worthiness

citation_worthiness

experiments

experiments

.gitignore

.gitignore

LICENSE.TXT

LICENSE.TXT

README.md

README.md

pytest.ini

pytest.ini

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Modeling citation worthiness by using attention-based bidirectional long short-term memory networks and interpretable models

Data

Dependencies

Run the experiments

Live demo for citation worthiness

How to cite

Science of Science and Computational Discovery Lab

License

About

Releases

Packages

Languages

License

sciosci/cite-worthiness

Folders and files

Latest commit

History

Repository files navigation

Modeling citation worthiness by using attention-based bidirectional long short-term memory networks and interpretable models

Data

Dependencies

Run the experiments

Live demo for citation worthiness

How to cite

Science of Science and Computational Discovery Lab

License

About

Resources

License

Stars

Watchers

Forks

Languages