Skip to content

anca-roxanne/potara

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status Coverage Status

Basics

Potara is a multi-document summarization system that relies on Integer Linear Programming (ILP) and sentence fusion.

Its goal is to summarize a set of related documents. It proceeds by fusing similar sentences in order to create sentence that are either shorter or more informative than those found in the documents. It then uses ILP in order to choose the best set of sentences, fused or not, that will compose the resulting summary.

It relies on state-of-the-art approaches introduced by Gillick and Favre for the ILP strategy, and Filippova for the sentence fusion.

How To

Basically, you can use the following

s = Summarizer()
print("Adding docs")
s.setDocuments([document.Document('pathtofilenumber' + n)
       for i in range(1,11)])
print("summarizing")
s.summarize()
print(s.summary())

There's some preprocessing involved and a sentence fusion step, but I made it easily tunable.

About

Multi-document summarization tool relying on ILP and sentence fusion

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%