Skip to content

juliaklezl/automatic-summary

Repository files navigation

automatic-summary

Final course project for the Machine Learning course in MLT in fall 2020.

Instructions for running: Use the "run" notebook for running everything. In addition to the usual packages, skipthought and rouge need to be installed before running. (https://pypi.org/project/skipthoughts/, https://pypi.org/project/rouge/). I used pip to install them.

Data: The data I use is a small part of the CNN portion of the CNN-DailyMail Dataset (https://github.com/abisee/cnn-dailymail). I only used 3 of the files (3000 texts) for testing and evaluating my program (for time/efficiency reasons). These are in the data/texts folder of this repo. The full version of the dataset (in an already tokenized version) can be downloaded here: https://github.com/JafferWilson/Process-Data-of-CNN-DailyMail.

The pretrained GloVe embeddings can be downloaded here https://nlp.stanford.edu/projects/glove/, I used the 6B version with 100 dimensions trained on Wikipedia and Gigaword 5.

About

Final course project for the Machine Learning course in MLT in fall 2020.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published