Python lower示例

编程语言: Python

命名空间/包名称: nlpia.data.loaders.kite_text

方法/功能: lower

hotexamples.com的示例: 2

Python lower - 已找到2个示例。这些是从开源项目中提取的最受好评的nlpia.data.loaders.kite_text.lower现实Python示例。您可以评价示例，以帮助我们提高示例质量。

示例#1

显示文件

文件： tfidf_counter.py 项目： gopal151295/ml_nlp_practice

from collections import Counter
from nltk.tokenize import TreebankWordTokenizer

tokenizer = TreebankWordTokenizer()

from nlpia.data.loaders import kite_text
tokens = tokenizer.tokenize(kite_text.lower())

token_counts = Counter(tokens)
token_counts

# remove common stopwords
import nltk
nltk.download('stopwords')

stopwords = nltk.corpus.stopwords.words('english')
tokens = [x for x in tokens if x not in stopwords]
kite_counts = Counter(tokens)

kite_counts

kite_counts.most_common(10)

示例#2

显示文件

文件： r3-ipf.py 项目： mwwojcik/ml_workspace

from nlpia.data.loaders import kite_text, kite_history
from nltk.tokenize import TreebankWordTokenizer
from collections import Counter

tokenizer = TreebankWordTokenizer()
#tworzymy korpus składający się z dwóch tekstów poświęconych latawcom
kite_intro = kite_text.lower()
kite_history = kite_history.lower()

#obydwa dokumenty w korpusie dzielimy na tokeny
kite_intro_tokens = tokenizer.tokenize(kite_intro)
kite_history_tokens = tokenizer.tokenize(kite_history)

#zliczamy poszczególne słowa
intro_counts = Counter(kite_intro_tokens)
history_counts = Counter(kite_history_tokens)

#długości wektorów
intro_tokens_total = len(kite_intro_tokens)
history_tokens_total = len(kite_history_tokens)

intro_tf = {}
history_tf = {}

#obliczamy TF słowa 'kite' dla każdego z dokumentów
intro_tf['kite'] = intro_counts['kite'] / intro_tokens_total
history_tf['kite'] = history_counts['kite'] / history_tokens_total

#obliczamy TF słowa 'and' dla każdego z dokumentów
intro_tf['and'] = intro_counts['and'] / intro_tokens_total
history_tf['and'] = history_counts['and'] / history_tokens_total