Skip to content

aindilis/paperless-office

Repository files navigation

Paperless Office

Nuance Paperport (TM) equivalent

Scan, OCR, text classify, and file (into filing cabinets) documents. It uses Ocropus and Tesseract to OCR images obtained by the scanner. The text is then classified and various other meta-data is derived. A location within your filing cabinets is prescribed. When you require documents, you have full text search along with other search capabilities (similar to many aspects of Digilib).

I have been writing a system that partially satisfies the notion of "Open Source Paperport Equivalent". But it does a lot of things that I don't think Paperport does. For instance, it has automatic document classification, syncs with your filing cabinet, has date extraction and fills a calendar with date mentions for easy checking of due dates, has semantic web integration and can do a lot of sophisticated natural language processing, such as extracting todo lists from documents, spam detection, urgency classification, along with planning, scheduling and execution features. (You can set due dates, and document and task interdependencies, i.e. this document has to be sent to so and so and a reply received before we can fill out this document). So it has workflow support.

There are many more options and plans for this system than are easy to reveal at this moment. Much of the way it handles documents will be similar eventually to KBFS. It can be used to maintain the reading lists for CLEAR and Study. It will integrate with SPSE2 and PICVis when they are complete enough to represent various domains. It may even merge somewhat with them.

http://frdcsa.org/frdcsa/minor/paperless-office

About

Nuance Paperport (TM) equivalent

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published