This is be roughly split into three parts: Input, Process, Output
Getting the photo. We wait until the photos have been imported onto the processing computer.
- We query the user to determine the relevant folder (ui still under construction)
- We run over all of the photos to determine which ones are receipts (can we use ML for this?)
Investigation onto Photos naming scheme still needed.
We apply OCR onto each photo to extract relevant tags and attributes.
- The extracted data currently is the date-time, and the total amount. (currently working on the address)
- This extracted data is then compared to the database, where duplicates are ignored. (We have the time attribute so this should be okay). Unique entries are inserted into the relevant line.
- For questionable attributes such as the address, the computer will prompt with a image of the text in question. (Can ML be used to improve this accuracy?)
We log all data into a .csv file.
brew install tesseract
pip install Pillow
pip install pytesseract
xcode-select --install
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install cartr/qt4/pyqt
brew install python
/usr/local/Cellar/python/2.7.13/Frameworks/Python.Framework/Version/2.7/bin/python2.7
/usr/local/Cellar/pyqt/
robonobodojo
for the excellent guide
to view markdown in atom, use ctrl-shift-m