Skip to content
This repository has been archived by the owner on May 4, 2021. It is now read-only.

michaellzc/cmput497_a3_yonael_zichun3

Repository files navigation

cmput497_a3_yonael_zichun3

Prerequisites

How to run

Setup

# Setup python virtual environment
$ virtualenv venv --python=python3
$ source venv/bin/activate

# Install python dependencies
$ pip install -r requirements.txt

Stanford POS Tagger

# transform data files to use "_" as seperator and one sentence per line
# or make format-dev to create a subet of training file
# for development
$ make format

# train models
# NOTE to marker: model has been pre-trained and attach as a part of the submission
# you may skip this part and test directly
$ make train

# test models
# tagged sentences are saved under `output/` directory
$ make test

# run error analysis
# the output consists of accuracy, confusion metrics, percision/recall, and some other stuffs
$ python stanford_post_analysis.py > test-stanford-output.txt

HMM and Brill

# Train two HMM models on both respective testing sets and opposite testing sets
# the output consists of accuracy, confusion metrics, percision/recall, and some other stuffs
$ make test-hmm > test-hmm-output.txt

# Train two Brill models on both respective testing sets and opposite testing sets
# the output consists of accuracy, confusion metrics, percision/recall, and some other stuffs
$ make test-brill > test-brill-output.txt

Output

Output tagged sentences are avaliable under output/ directory following such convention <tagger_name>.<test_file>-tagged.<train_file>.txt.

Authors

  • Yonael Bekele
  • Michael Lin

About

CMPUT 497 Intro to NLP Assignment 3: Part of Speech Tagging

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published