This repo contains assignments for the class GR 5067: Natural Language Processing offered by the Columbia University Quantitative Methods in Social Science department. The class and its assignments aim to "provide a detailed tour on how to access, clean, “munge” and organize data, both big and small." (taken from the course syllabus, which the instructor would prefer not to be forked).
Course assignments focused on:
- HW1 - Familiarising students with Python syntax
- HW2 - Use a Google search crawler (instructor provided) to generate a corpus of text files
- HW3 - Simple word search, and model based sentiment analysis
- HW4 - Streaming twitter classifier
The course final project was a free-choice natural language processing project, and a class presentation. I chose to run an LDA model on the Book of Psalms. A full report is available in this repo.