Skip to content

11411 Natural Language Processing Final Project. Reads wikipedia articles, and then can both answer natural-language questions about the article as well as generate comprehension questions. Built using ARKref Noun Phrase Coreference developed by Brendan O'Connor and Michael Heilman, and NLTK (a common natural language toolkit for Python).

License

sushengyang/NLP-project

 
 

Repository files navigation

NLP Term Project

11411 Group 6 Project. See NLP Project Page

Contributors

Timeline

  • Thursday February 7, Stub Program (Ryhan) & Initial Plan (Daniel)
  • Tuesday February 26, Progress Report 1 (Stephen)
  • Thursday March 21, Progress Report 2 (Ryhan)
  • Tuesday April 9, Dry run system
  • Tuesday April 16, Project code due
  • Tuesday April 30, Demos at Google
  • Thursday May 2, Final Report

Asking Program

./ask article.txt nquestions

The asking program takes an

  • article.txt containing a Wikipedia article and
  • an integer nquestions.

Answering Program

./answer article.txt questions.txt

The answering program takes an

  • article.txt containing a Wikipedia article and
  • a textfile questions.txt containing one question per line.

Getting Started

Permissions

chmod +x ask
chmod +x answer

Installing NLTK

See NLTK installation guide

First download setuptools, http://pypi.python.org/pypi/setuptools

sudo sh Downloads/setuptools-...egg
sudo easy_install pip 
sudo pip install -U numpy
sudo pip install -U pyyaml nltk

Download NLTK datasets

python
>>> import nltk
>>> nltk.download()

Once the NLTK Downloader GUI pops up, download all to /Users/USERNAME/nltk_data

About

11411 Natural Language Processing Final Project. Reads wikipedia articles, and then can both answer natural-language questions about the article as well as generate comprehension questions. Built using ARKref Noun Phrase Coreference developed by Brendan O'Connor and Michael Heilman, and NLTK (a common natural language toolkit for Python).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 49.1%
  • Java 36.3%
  • Python 10.5%
  • Lex 3.8%
  • TeX 0.3%
  • CSS 0.0%