thesis

Thesis / Research code

Steps:

Get the sentences from the corpus and separate them into training and test data. (brown.sentences + brown_test.sentences)

File: scripts/get_sentences_from_brown.py

####################################################

File: brown.sentences + brown_test.sentences

Format: [sentence1, sentence2, sentence3....]

####################################################

Count the words in the training + test data and store them in words.count

File: scripts/count_words_and_write_to_file.py

####################################################

File: words.count

Format: {word1: count1, word2, count2, ...}

####################################################

Create Huffman code for the words in words.count and store them in files as per their bit word lengths.

File: scripts/create_huffman_code_for_all_words.py

####################################################

File name: <word_len>.code_length

Format: word1, word2, word3, ...

####################################################

Calculate unigram and bigram probabilities for the words on brown.sentences

File: scripts/calculate_unigram_probabilities.py

####################################################

File: ugram.probs

Format: prob[(<huffman_encoded_word_1>)] = P[(<huffman_encoded_word_1>)]

####################################################

File: scripts/calculate_bigram_probabilities.py

####################################################

File: bigram.probs

probability of word2 following word1 appearing together given word1

Format: prob[(<huffman_encoded_word_1>, <huffman_encoded_word_2>)] = P[(<huffman_encoded_word_1>, <huffman_encoded_word_2>) / <huffman_encoded_word_1>)]

####################################################

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
lib		lib
scripts		scripts
static		static
templates		templates
README.md		README.md
app.py		app.py
requires.txt		requires.txt
setup.sh		setup.sh
webapp.py		webapp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lib

lib

scripts

scripts

static

static

templates

templates

README.md

README.md

app.py

app.py

requires.txt

requires.txt

setup.sh

setup.sh

webapp.py

webapp.py

Repository files navigation

thesis

File: brown.sentences + brown_test.sentences

Format: [sentence1, sentence2, sentence3....]

File: words.count

Format: {word1: count1, word2, count2, ...}

File name: <word_len>.code_length

Format: word1, word2, word3, ...

File: ugram.probs

Format: prob[(<huffman_encoded_word_1>)] = P[(<huffman_encoded_word_1>)]

File: bigram.probs

probability of word2 following word1 appearing together given word1

Format: prob[(<huffman_encoded_word_1>, <huffman_encoded_word_2>)] = P[(<huffman_encoded_word_1>, <huffman_encoded_word_2>) / <huffman_encoded_word_1>)]

About

Releases

Packages

Languages

parthabb/thesis

Folders and files

Latest commit

History

Repository files navigation

thesis

File: brown.sentences + brown_test.sentences

Format: [sentence1, sentence2, sentence3....]

File: words.count

Format: {word1: count1, word2, count2, ...}

File name: <word_len>.code_length

Format: word1, word2, word3, ...

File: ugram.probs

Format: prob[(<huffman_encoded_word_1>)] = P[(<huffman_encoded_word_1>)]

File: bigram.probs

probability of word2 following word1 appearing together given word1

Format: prob[(<huffman_encoded_word_1>, <huffman_encoded_word_2>)] = P[(<huffman_encoded_word_1>, <huffman_encoded_word_2>) / <huffman_encoded_word_1>)]

About

Resources

Stars

Watchers

Forks

Languages