Skip to content

wigasper/FUSE

Repository files navigation

FUSE

Development repo for my work on predicting PubMed MeSH terms using citation networks.

There is a lot going on here and more or less represents my sandbox during the process. Some notable files:

baseline_expanded.ipynb - The final simple decision tree model. Includes evaluation metrics, prediction examples, and some exploratory analyses.

baseline_models.ipynb - The initial work done on a reduced dataset to prove concept.

compute_semantic_similarity.ipynb - A notebook walking through the process to determine semantic similarity values for each term, a more palatable version of this that can run from the command line is available in the pubmed-mesh-utils repository.

data_aggregation_pipeline - A notebook detailing how I aggregated data for the initial work (reduced dataset).

data-aggregation/data_generator.py - The script used to build the expanded dataset from NCBI's bulk data dumps.

measure_term_co-occurrence.ipynb - A notebok walking through the process for computing term co-occurrence log-likelihood ratios, a more palatable version of this that can run from the command line is currently in development in the pubmed-mesh-utils repository.

More user-friendly versions of some of the tools I used here are available in the pubmed-mesh-utils repository.

About

Using citation networks to predict term indexing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published