AIT 690 | Assignment 1 | Due 9/19/2018

Billy Ermlick

Xiaojie Guo

Nidhi Mehrotra

This code aims to implement a dialogue robot Eliza who can engage in a dialogue with the user. Eliza will begin the dialogue by asking the name of the user. Current implementation of Eliza contains:

"Word Spotting" function to extract the keywords from users' answer and give related feedback (e.g. 'sad' to 'What makes you sad?')
Transforming Sentences from user to the answers/questions from Eliza (e.g., " I am unsure on what to do." to "You are unsure on what to do?")
Personalized the conversation by using user name in some of the questions
Robust answers in some plausible way when the users enters some gibberish sentence or the sentences are not understandable (e.g., I'm sorry I didn't catch that. What is your name again?)
Used lambda function for asking default questions to the user, such as storing and returning the user's favorite things throughout discourse.

HOW TO RUN THE PROGRAM:
When using the code, just run the Assignment1.py and type your response into the prompt and press ENTER. Do not use contractions.

Simple Example Dialogue:
HELLO, MY NAME IS ELIZA. I AM A PSYCHOTHERAPIST. WHAT IS YOUR NAME?
'My name is Coco'
NICE TO MEET YOU, COCO. HOW DO YOU FEEL TODAY?
'i fee l sAD...'
"I DIDN'T QUITE UNDERSTAND. CAN YOU SAY THAT ANOTHER WAY?"
'I feel sad'
TELL ME MORE.
'I always have lot of work to do'
CAN YOU THINK OF A SPECIFIC EXAMPLE?
'Yes, I need to read 10 paper a week'
ARE YOU SURE?
'Pretty sure'
COCO, WHY PRETTY SURE?
'can you sing a song for me?'
Sure
'I want to share my favorites'
COCO, WHY DO YOU WANT TO SHARE YOUR FAVORITES?
'You should know my favorite things'
SPEAKING OF FAVORITES... I'D LIKE TO LEARN MORE ABOUT YOU
I know your favorite song is ...well maybe I am not sure....
...What is your favorite song?
'My favorite song is Raise you up'
GOT IT. WHAT ELSE IS ON YOUR MIND?
'Nothing! bye"
I HOPE THIS CONVERSATION WAS PRODUCTIVE. GOODBYE.
***************************************************************************************

AIT690-Assignment2

This Python program called ngram.py will learn an N-gram language model from an arbitrary number of plain text files. The program can generate a given number of sentences based on that N-gram model.

This program can work for any value of N, and output m sentences as the user requires. Your can run the program as follows:

ngram.py n m input-file/s

n refers to the number of grams and m refers to the number of sentences you want to generate.

for example: ngram.py 3 10 'austen-emma.txt' 'austen-persuasion.txt'

The .txt files used in this project are from http://www.gutenberg.org. Thus, you could chose the files name as follows:

'austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt', 'blake-poems.txt', 'bryant-stories.txt',
'burgess- busterbrown.txt', 'carroll-alice.txt', 'chesterton-ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt',
'edgeworth-parents.txt', 'melville-moby_dick.txt', 'milton-paradise.txt', 'shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-leaves.txt'

Some of the code for fetching the file and calculating Conditional Frequency Distribution is picked up from NTLK Book. https://www.nltk.org/book/

**************************************************************************************

AIT690-Assignment3

This is a python program which assigns parts of speech tags to a training file which maximize P(tag|word). For words which are not included in the training file, they are assumed to be NNself. Words which only have one part of speech in the training data are labeled as that part of speech in the test file. Words with multiple potential parts of speech which have unlabeled neighbors are tagged as their most likely tag in the training dataset. After this proceedure, untagged words with tagged neighbors were assigned based on maximizing their conditional probabiities. The accuracy of our model before additional POS rules were applied was %55.17. After the addition of the rules, our accuracy increased to 80.87%.

The labeled training data is "pos-train.txt" The untagged test file is "pos-test.txt" The predicted labeled test data is "pos-test-with-tags.txt" The golden standard labeled test data is "pos-test-key.txt" The scoring file is "scorer.py"

"pos-tagging-report.txt" and "tagger-log.txt" are reporting and logging files,respectively.

The script can be run by entering:
$ python tagger.py pos-train.txt pos-test.txt > pos-test-with-tags.txt
$ python scorer.py pos-test-with-tags.txt pos-test-key.txt > pos-taggingreport.txt

Some of the code for the probability tables and confusion matrix was obtained from the NTLK Book. https://www.nltk.org/book/

Some of the rules were obtained from the Speech and Language Processing Book by Jurafsky et al.

**************************************************************************************

AIT690-Assignment4

Our performance = 72.22%
Baseline performance assuming all tags are the 'phone' sense = 57.15% = 72/126
Our Confusion Matrix:
phone product
phone 38 34
product 1 53

This program implements a decision list classifier to perform word sense disambiguation on the word 'line' used in different contexts.
Feature implemented from Yarowsky paper:
1) f_1W = -1word from target
2) f_W1 = +1word from target
3) f_1W2W = -1 and -2 words from target
4) f_W1W2 = +1 and +2 words from target
5) f_KW = -K words from target (k=3)
6) f_WK = +K words from target (k=3)
The program learns a decision list from line-train.xml and applies that decision list to each of the sentences found in line-test.xml in order to assign a sense to the word line. The program outputs the decision list it learns to my-decision-list.txt. The list shows show each feature, the log-likelihood score associated with it, and the sense it predicts. The program outputs the answer tags it creates for each sentence to STDOUT.

**************************************************************************************

PROJECT

DATA available via -> https://drive.google.com/drive/folders/1gOBlngdaolH7OUROw3pgA02R1vEtHzM5?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
.idea		.idea
Assignment1		Assignment1
Assignment2		Assignment2
Assignment3		Assignment3
Assignment4		Assignment4
PatentPredictionProject		PatentPredictionProject
.gitignore		.gitignore
README.md		README.md
parser.py		parser.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

Assignment1

Assignment1

Assignment2

Assignment2

Assignment3

Assignment3

Assignment4

Assignment4

PatentPredictionProject

PatentPredictionProject

.gitignore

.gitignore

README.md

README.md

parser.py

parser.py

Repository files navigation

AIT 690 | Assignment 1 | Due 9/19/2018

Billy Ermlick

Xiaojie Guo

Nidhi Mehrotra

AIT690-Assignment2

AIT690-Assignment3

AIT690-Assignment4

Our performance = 72.22%
Baseline performance assuming all tags are the 'phone' sense = 57.15% = 72/126
Our Confusion Matrix:
phone product
phone 38 34
product 1 53

PROJECT

About

Releases

Packages

Languages

anonymous1025/AIT690

Folders and files

Latest commit

History

Repository files navigation

AIT 690 | Assignment 1 | Due 9/19/2018

Billy Ermlick

Xiaojie Guo

Nidhi Mehrotra

AIT690-Assignment2

AIT690-Assignment3

AIT690-Assignment4

Our performance = 72.22% Baseline performance assuming all tags are the 'phone' sense = 57.15% = 72/126 Our Confusion Matrix: phone product phone 38 34 product 1 53

PROJECT

About

Resources

Stars

Watchers

Forks

Languages

Our performance = 72.22%
Baseline performance assuming all tags are the 'phone' sense = 57.15% = 72/126
Our Confusion Matrix:
phone product
phone 38 34
product 1 53