Blue notes parsing script for personal use
repo consist three main folders:
- raw input - pdf and its recognition as a word text
- txt files by chapter - cleaned manually text, one chop per person
- simple python script, which chomp texts into 'persons'and makes a guess which line represents what type of data
at this stage I am writing separate algorithms to recognize different data in text.
raw pdf and word can be found via those links: blue_pages.doc blue_pages.pdf