Original Author: (Christopher.gondek@uni-konstanz.de)
You put in a parsed PDF as plain text, like the example txt-file appended, and you will get noise-reduced sentences with according page numbers for each one. I think any decent PDF parse would work, I used http://pdftotext.com/.