Skip to content

emschorsch/nlp-midterm-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

README file for the midterm project for cs65
Steve Dini and Emanuel Schorsch
==============================================
Contents of this directory:
i.   trie.py
     -Class implementation for the trie data structure. Supported external
      methods include insert, build, successor and predecessor counts.

ii.  counts.py written by Steve Dini
     -basic implementation for word segmentation based on just successor and 
      predecessor counts as explained in the Harris paper.

iii. varieties.py written by Emanuel Schorsch
     -contains the other implementations based on the Hafer paper. Implemented
      methods include:
      a) Reverse cutoff (k=14)
      b) Reverse cutoff (k=22)
      c) Duo cutoff (k1=2, k2=4)
      d) Sum cutoff (k=22)
      e) Duo Peaks
      f) Sum Peaks
      g) Negative Frequency

iv)  dejean.py written by Steve Dini
     -contains an implementation of Dejean's algorithm absent the contextual 
      segmentation described as the last step

v)   stats.py
     -helper module for getting values for cuts made, number of expected 
      correct cuts as well as the number of correct cuts actually made.
     -Also has support for computing precision and recall   

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages