GitHub - aindilis/paperless-office: Nuance Paperport (TM) equivalent

Paperless Office

Nuance Paperport (TM) equivalent

Scan, OCR, text classify, and file (into filing cabinets) documents. It uses Ocropus and Tesseract to OCR images obtained by the scanner. The text is then classified and various other meta-data is derived. A location within your filing cabinets is prescribed. When you require documents, you have full text search along with other search capabilities (similar to many aspects of Digilib).

I have been writing a system that partially satisfies the notion of "Open Source Paperport Equivalent". But it does a lot of things that I don't think Paperport does. For instance, it has automatic document classification, syncs with your filing cabinet, has date extraction and fills a calendar with date mentions for easy checking of due dates, has semantic web integration and can do a lot of sophisticated natural language processing, such as extracting todo lists from documents, spam detection, urgency classification, along with planning, scheduling and execution features. (You can set due dates, and document and task interdependencies, i.e. this document has to be sent to so and so and a reply received before we can fill out this document). So it has workflow support.

There are many more options and plans for this system than are easy to reveal at this moment. Much of the way it handles documents will be similar eventually to KBFS. It can be used to maintain the reading lists for CLEAR and Study. It will integrate with SPSE2 and PICVis when they are complete enough to represent various domains. It may even merge somewhat with them.

http://frdcsa.org/frdcsa/minor/paperless-office

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
PaperlessOffice		PaperlessOffice
frdcsa		frdcsa
misc		misc
old/scan-py		old/scan-py
scripts		scripts
systems		systems
t		t
LICENSE		LICENSE
Makefile		Makefile
PaperlessOffice.pm		PaperlessOffice.pm
README.md		README.md
cleaning-txt.do		cleaning-txt.do
data		data
errors.txt		errors.txt
gpl.txt		gpl.txt
model.flr		model.flr
new		new
paperless-office		paperless-office
paperless-office.conf		paperless-office.conf
scanning-howto		scanning-howto
scanning-howto.orig		scanning-howto.orig
thumbnail.gif		thumbnail.gif
to.do		to.do
to.do.orig		to.do.orig

License

aindilis/paperless-office

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages