Skip to content

A software which creates n-Gram (1-5) Maximum Likelihood Probabilistic Language Model with Laplace Add-1 smoothing and stores it in hash-able dictionary form

License

thientu/ngram

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status

Ngram

A software which creates n-Gram (1-5) Maximum Likelihood Probabilistic Language Model with Laplace Add-1 smoothing and stores it in hash-able dictionary form.

class nGram
    |  A program which creates n-Gram (1-5) Maximum Likelihood Probabilistic Language Model with Laplace Add-1 smoothing
    |      and stores it in hash-able dictionary form.
    |      n: number of bigrams (supports up to 5)
    |      corpus_file: relative path to the corpus file.
    |      cache: saves computed values if True
    |
    |
    |  Usage:
    |  >>> ng = nGram(n=5, corpus_file=None, cache=False)
    |  >>> print(ng.sentence_probability(sentence='hold your horses', n=2, form='log'))
    |  >>> -18.655540764
    |
    |  Methods defined here:
    |
    |  __init__(self, n=1, corpus_file=None, cache=False)
    |      Constructor method which loads the corpus from file and creates ngrams based on imput parameters.
    |
    |  create_bigram(self, cache)
    |      Method to create Bigram Model for words loaded from corpus.
    |
    |  create_pentigram(self, cache)
    |      Method to create Pentigram Model for words loaded from corpus.
    |
    |  create_quadrigram(self, cache)
    |      Method to create Quadrigram Model for words loaded from corpus.
    |
    |  create_trigram(self, cache)
    |      Method to create Trigram Model for words loaded from corpus.
    |
    |  create_unigram(self, cache)
    |      Method to create Unigram Model for words loaded from corpus.
    |
    |  load_corpus(self, file_name)
    |      Method to load external file which contains raw corpus.
    |
    |  probability(self, word, words='', n=1)
    |      Method to calculate the Maximum Likelihood Probability of n-Grams on the basis of various parameters.
    |
    |  sentence_probability(self, sentence, n=1, form='antilog')
    |      Method to calculate cumulative n-gram Maximum Likelihood Probability of a phrase or sentence.

About

A software which creates n-Gram (1-5) Maximum Likelihood Probabilistic Language Model with Laplace Add-1 smoothing and stores it in hash-able dictionary form

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%