Nuce

This repository is an extenstion of https://github.com/miso-belica/sumy. We have create 3 new classifiers:

Embedding : Use pairwise embeddings cos of the words of the sentences to get the similarity score betweent the sentences. The remaining algorithm is similar to Text-Rank. Code at sumy/summarizers/text_rank_embedding.py.
Named entity recognition + TextRank: Append the normalised score of Text rank with normalised score of NER before applying PageRank in the TextRank algorithm. Code at sumy/summarizers/text_rank_ner.py.
Sum Basic + TextRank: Append the normalised score of Text rank with normalised score of Sum Basic before applying PageRank in the TextRank algorithm. Code at sumy/summarizers/text_rank_tf.py.

Build instructions

Download GoogleNews-vectors-negative300.bin.gz in one folder above the repository (we can not upload it as size is greater than 100MB).
Install pip dependencies using the requirements.txt
run python setup.py build && python setup.py install and follow usage guide
To check alternate summarizers, either write a test in tests folder or modify the sumy/main.py.

Usage:

    sumy (luhn | edmundson | lsa | text-rank | text-rank-mod | lex-rank | sum-basic | kl) [--length=<length>] [--language=<lang>] [--stopwords=<file_path>] [--format=<format>] --url=<url>
    sumy (luhn | edmundson | lsa | text-rank | text-rank-mod | lex-rank | sum-basic | kl) [--length=<length>] [--language=<lang>] [--stopwords=<file_path>] [--format=<format>] --file=<file_path>
    sumy (luhn | edmundson | lsa | text-rank | text-rank-mod | lex-rank | sum-basic | kl) [--length=<length>] [--language=<lang>] [--stopwords=<file_path>] [--format=<format>] --text=<text>
    sumy --version
    sumy --help
Options:
    --length=<length>        Length of summarized text. It may be count of sentences
                             or percentage of input text. [default: 20%]
    --language=<lang>        Natural language of summarized text. [default: english]
    --stopwords=<file_path>  Path to a file containing a list of stopwords. One word per line in UTF-8 encoding.
                             If it's not provided default list of stop-words is used according to chosen language.
    --format=<format>        Format of input document. Possible values: html, plaintext
    --url=<url>              URL address of the web page to summarize.
    --file=<file_path>       Path to the text file to summarize.
    --text=<text>            Raw text to summarize
    --version                Displays current application version.
    --help                   Displays this text.

text-rank-mod runs text-rank-ner and can be modified in sumy/main.py.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.pytest_cache/v/cache		.pytest_cache/v/cache
DUC_data		DUC_data
build/lib.linux-x86_64-2.7		build/lib.linux-x86_64-2.7
dist		dist
sumy.egg-info		sumy.egg-info
sumy		sumy
tests		tests
.coverage		.coverage
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
README.rst		README.rst
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tasks.py		tasks.py
test2.py.txt		test2.py.txt

License

gsandeep1241/TextSummarizer

Folders and files

Latest commit

History

Repository files navigation

Build instructions

Usage:

Demo and more details at http://35.227.103.104.

About

Resources

License

Stars

Watchers

Forks

Languages