automatic-summary

Final course project for the Machine Learning course in MLT in fall 2020.

Instructions for running: Use the "run" notebook for running everything. In addition to the usual packages, skipthought and rouge need to be installed before running. (https://pypi.org/project/skipthoughts/, https://pypi.org/project/rouge/). I used pip to install them.

Data: The data I use is a small part of the CNN portion of the CNN-DailyMail Dataset (https://github.com/abisee/cnn-dailymail). I only used 3 of the files (3000 texts) for testing and evaluating my program (for time/efficiency reasons). These are in the data/texts folder of this repo. The full version of the dataset (in an already tokenized version) can be downloaded here: https://github.com/JafferWilson/Process-Data-of-CNN-DailyMail.

The pretrained GloVe embeddings can be downloaded here https://nlp.stanford.edu/projects/glove/, I used the 6B version with 100 dimensions trained on Wikipedia and Gigaword 5.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
data/texts		data/texts
Automatic Summarization report.pdf		Automatic Summarization report.pdf
README.md		README.md
cluster.py		cluster.py
data_prep.py		data_prep.py
evaluation.py		evaluation.py
run.ipynb		run.ipynb
textrank.py		textrank.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

pycache

pycache

data/texts

data/texts

Automatic Summarization report.pdf

Automatic Summarization report.pdf

README.md

README.md

cluster.py

cluster.py

data_prep.py

data_prep.py

evaluation.py

evaluation.py

run.ipynb

run.ipynb

textrank.py

textrank.py

Repository files navigation

automatic-summary

About

Releases

Packages

Languages

juliaklezl/automatic-summary

Folders and files

Latest commit

History

Repository files navigation

automatic-summary

About

Resources

Stars

Watchers

Forks

Languages