Exemple #1
0
f = cr^-s

where s and c are parameters that depend on the language and the text. If you take the logarithm of
both sides of this equation, you get:

log f = log c - s log r

So if you plot log f versus log r, you should get a straight line with slope -s and intercept log c.
Write a program that reads a text from a file, counts word frequencies, and prints one line for each
word, in descending order of frequency, with log f and log r. Use the graphing program of your
choice to plot the results and check whether they form a straight line. Can you estimate the value of
s?
Solution: http://www.greenteapress.com/thinkpython/code/zipf.py. To make the plots, you might have to
install matplotlib (see http://matplotlib.org/).
'''

if __name__ == '__main__':
    print "Exercise 13:"

    mylist = process_file('emma.txt')
    myhist = histogram(mylist)
    sorted_list = convert_to_sorted_list(myhist)

    freq_list = []
    for freq in sorted_list:
        freq_list.append(freq[0])

    for idx, freq in enumerate(freq_list):
        print idx, freq
Exemple #2
0
You should attempt this exercise before you go on; then you can can download my
solution from http://www.greenteapress.com/thinkpython/code/markov.py.
You will also need http://www.greenteapress.com/thinkpython/code/emma.txt.
'''



if __name__ == '__main__':

    sortedbookwordslist = convert_to_sorted_list(book)
    print sortedbookwordslist

    print len(book), "different words were used."
    compare_to_wordlist("words.txt")

    choose_from_hist(sample_hist)

    hist = process_file('emma.txt')
    words = process_file('words.txt')

    diff = missing_from_words(hist, words)
    print "The words in the book that aren't in the word list are:"
    for word in diff:
        print word,

    myhist = chapter11.histogram(book)
    print "\n random word:", pick_random_word(myhist)