Skip to content

abensrhir/sumy

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automatic text summarizer

image

Here are some other summarizers:

Installation

Currently only from git repo (make sure you have Python installed)

$ wget https://github.com/miso-belica/sumy/archive/master.zip # download the sources
$ unzip master.zip # extract the downloaded file
$ cd sumy-master/
$ [sudo] python setup.py install # install the package

Or simply run:

$ [sudo] pip install git+git://github.com/miso-belica/sumy.git

Usage

Sumy contains command line utility for quick summarization of documents.

$ sumy lex-rank --length=10 --url=http://en.wikipedia.org/wiki/Automatic_summarization # what's summarization?
$ sumy luhn --language=czech --url=http://www.zdrojak.cz/clanky/automaticke-zabezpeceni/
$ sumy edmundson --language=czech --length=3% --url=http://cs.wikipedia.org/wiki/Bitva_u_Lipan
$ sumy --help # for more info

Various evaluation methods for some summarization method can be executed by commands below:

$ sumy_eval lex-rank reference_summary.txt --url=http://en.wikipedia.org/wiki/Automatic_summarization
$ sumy_eval lsa reference_summary.txt --language=czech --url=http://www.zdrojak.cz/clanky/automaticke-zabezpeceni/
$ sumy_eval edmundson reference_summary.txt --language=czech --url=http://cs.wikipedia.org/wiki/Bitva_u_Lipan
$ sumy_eval --help # for more info

Python API

Or you can use sumy like a library in your project.

# -*- coding: utf8 -*-

from __future__ import absolute_import
from __future__ import division, print_function, unicode_literals

from sumy.parsers.html import HtmlParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer
from sumy.nlp.stemmers.czech import stem_word
from sumy.utils import get_stop_words


if __name__ == "__main__":
    url = "http://www.zsstritezuct.estranky.cz/clanky/predmety/cteni/jak-naucit-dite-spravne-cist.html"
    parser = HtmlParser.from_url(url, Tokenizer("czech"))

    summarizer = LsaSummarizer(stem_word)
    summarizer.stop_words = get_stop_words("czech")

    for sentence in summarizer(parser.document, 20):
        print(sentence)

Tests

Run tests via

$ nosetests-2.6 && nosetests-3.2 && nosetests-2.7 && nosetests-3.3

About

Module for automatic summarization of text documents and HTML pages.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published