11611-proj

Q&A Robot

Our final project for 11-611

Files to execute: ask.py - Located in src. answer.py - Located in src.

There are two modules in this project

First was the ask program that takes the Wikipedia articles as input and generates a set of questions. a. Used Stanford NLTK sentence segmenter to get a list of sentences. b. The I look for sentence with simple predicate sentences. These represents facts which I can transform into questions. These follow the form NP-VP-Period. c. I also look for Appositions, which is NP followed by another NP separated by comma. This means that the NPs are related. d. Now that I have the predicate sentences, I generate three kinds of questions. i. Binary Question: They are simple yes no questions. To generate these question I invert the position of modal verb and the subject and append a '?' at the end. ii. Confounded Binary question: This is same as above but I replace the subjects with its synonym/antonyms taken from word.net relations. iii. 'Wh' Questions: I look for specific syntactic structure in the predicate sentences. I used named entity recognition for generating these questions. For that I used stanford NER tagger to identify location, people etc. 1) For How question I look for adverbs. 2) For what questions I look for NPs. 3) For when questions I look for Location tag by the NER tagger iv. When the appropriate constituent is identified I invert the sentence using the same process as for binary sentences and insert question word in the beginning of the sentences.
Second was the answer program which takes Wikipedia articles and a set of questions as input and generates answer. i. For Each I identify the question type. Parsing the question with Stanford parse and seeing the initial labels. This will tell us if the question is binary or 'Wh'. If I am able to determine the type of question then I first invert the subject in question and remove the question word to convert the question to its predicate form. I do this because the predicate form I more likely to be present in the articles. Then I use the co-sine similarity for all the sentences similar to I did for above. I return the sentence with highest value. To verify if the answer was correct or not is yes to implemented. ii. If I am not able to determine the type of question and I scan the document to find the closest sentence to the question. To do this I use co-sine similarity of the tf-idf vector between the question and the sentence.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
src		src
testQ		testQ
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install_kenlm.sh		install_kenlm.sh
team_qa.txt		team_qa.txt
team_qa.xlsx		team_qa.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

src

src

testQ

testQ

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

install_kenlm.sh

install_kenlm.sh

team_qa.txt

team_qa.txt

team_qa.xlsx

team_qa.xlsx

Repository files navigation

11611-proj

About

Releases

Packages

Languages

License

ankittare/NLP_QuestionGeneration

Folders and files

Latest commit

History

Repository files navigation

11611-proj

About

Resources

License

Stars

Watchers

Forks

Languages