Python getTokens 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: textstats

메소드/함수: getTokens

hotexamples.com에서의 예제들: 4

Python getTokens - 4개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 textstats.getTokens에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: hitler_analy.py 프로젝트: knapppv94/knapp-synthesis-1

import pickle
import textstats

#read in the source text
f = open('hitler_speeches.txt', encoding='utf-8')
htxt = f.read()
f.close()

#obtain new list of word tokens
htoks = textstats.getTokens(htxt)

#remove symbols

symbols = list("~!@#$%^&*()_+-=`{}[]|\\:;\"',./<>?")

htoks_nosym = [t for t in htoks if t not in symbols]

#open a pickled version of the xkcd simple word list
#see https://xkcd.com/simplewriter/
f = open('xkcd_simple_words.p', 'rb')
xkcd_simp = pickle.load(f)
f.close()

#create new list of toks not in the xkcd_simp list
hnotsimptoks = [t for t in htoks_nosym if t not in xkcd_simp]

f = open('hnotsimptoks.p', 'wb')
pickle.dump(hnotsimptoks, f, -1)
f.close()

예제 #2

파일 보기

파일: bigrams_bible_austen.py 프로젝트: cclark94/compLing

# Christian Clark, [email protected], 29 September 2014

import pickle, textstats as ts

outFile = open('bigram_bible_austen_out.txt', 'w')


# Part 1: The King James Bible
# (A) and (B) Create token and type lists from the text file

bInfile = open('../Ling 1330/gutenberg/gutenberg/bible-kjv.txt')
bTxt = bInfile.read()
bInfile.close()

bToks = ts.getTokens(bTxt)
bTypes = ts.getTypes(bTxt)


# (C) Write out token and type counts to outFile

outFile.write('There are a total of '+str(len(bToks))+' word tokens and '+\
              str(len(bTypes))+' word types in the King James Bible.'+'\n\n')


# (D) Create bigram frequency dictionary

bBigrFreq = {}
for bigr in ts.getWordNGrams(bToks, 2):
    if bigr in bBigrFreq: bBigrFreq[bigr] += 1
    else: bBigrFreq[bigr] = 1

예제 #3

파일 보기

파일: musso_analy.py 프로젝트: knapppv94/knapp-synthesis-1

import pickle
import textstats

#read in the source text
f = open('mussolini_speeches.txt', encoding='utf-8')
mtxt = f.read()
f.close()

#obtain new list of word tokens
mtoks = textstats.getTokens(mtxt)

#remove symbols

symbols = list("~!@#$%^&*()_+-=`{}[]|\\:;\"',./<>?")

mtoks_nosym = [t for t in mtoks if t not in symbols]

#open a pickled version of the xkcd simple word list
#see https://xkcd.com/simplewriter/
f = open('xkcd_simple_words.p', 'rb')
xkcd_simp = pickle.load(f)
f.close()

#create new list of toks not in the xkcd_simp list
mnotsimptoks = [t for t in mtoks_nosym if t not in xkcd_simp]

f = open('mnotsimptoks.p', 'wb')
pickle.dump(mnotsimptoks, f, -1)
f.close()

예제 #4

파일 보기

파일: Obama.py 프로젝트: cclark94/compLing

# Christian ...

import pickle, textstats as ts

outFile = open('2009-Obama_out.txt', 'w')


# Part 1: The King James Bible
# (A) and (B) Create token and type lists from the text file

bInfile = open('2009-Obama.txt')
bTxt = bInfile.read()
bInfile.close()

bToks = ts.getTokens(bTxt)
bTypes = ts.getTypes(bTxt)


# (C) Write out token and type counts to outFile

outFile.write('There are a total of '+str(len(bToks))+' word tokens and '+\
              str(len(bTypes))+' word types in Obama\'s speech.'+'\n\n')


# (D) Create bigram frequency dictionary

bBigrFreq = {}
for bigr in ts.getWordNGrams(bToks, 2):
    if bigr in bBigrFreq: bBigrFreq[bigr] += 1
    else: bBigrFreq[bigr] = 1