This repo contains the source code for the tutorial series at http://www.thoughtly.co/blog/prototype. To help self-taught students trying to immerse themselves into ML/NLP, we are introducing a tutorial series focused on ML with an emphasis in NLP. The plan is to take users from basic concepts through more advanced subjects. We intend to provide simple, verbose, well documented code that allows the student to fully grasp concepts and techniques that are often glossed over in classes but which serve to provide a significant portion of the foundation needed for someone to get into ML for NLP.
The first post focuses on text handling with an emphasis on tokenization and term frequency. This post covers the general use of the code found in words.py: http://www.thoughtly.co/blog/working-with-text/.