Python Dictionary.add_word 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: gensim.corpora

클래스/타입: Dictionary

메소드/함수: add_word

hotexamples.com에서의 예제들: 1

Python Dictionary.add_word - 1개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 gensim.corpora.Dictionary.add_word에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

Dictionary(30)

add_documents(30)

load(30)

load_from_text(30)

filter_extremes(30)

doc2bow(30)

save(30)

compactify(30)

doc2idx(28)

save_as_text(28)

items(27)

filter_tokens(26)

keys(16)

from_corpus(15)

filter_n_most_frequent(13)

merge_with(10)

get(10)

values(9)

iteritems(7)

id2token(7)

from_documents(6)

patch_with_special_tokens(6)

token2id(4)

num_docs(2)

num_nnz(2)

dfs(2)

itervalues(1)

loadFromText(1)

filterExtremes(1)

most_common(1)

num_pos(1)

saveAsText(1)

add_word(1)

iterkeys(1)

예제 #1

파일 보기

class Corpus(object):
    def __init__(self, path):
        self.dictionary = Dictionary()
        self.train = self.tokenize(os.path.join(path, 'train.txt'))
        self.valid = self.tokenize(os.path.join(path, 'valid.txt'))
        self.test = self.tokenize(os.path.join(path, 'test.txt'))

    def tokenize(self, path):
        """Tokenizes a text file."""
        assert os.path.exists(path)
        # Add words to the dictionary
        with open(path, 'r') as f:
            tokens = 0
            for line in f:
                words = line.split() + ['<eos>']
                tokens += len(words)
                for word in words:
                    self.dictionary.add_word(word)

        # Tokenize file content
        with open(path, 'r') as f:
            ids = torch.LongTensor(tokens)
            token = 0
            for line in f:
                words = line.split() + ['<eos>']
                for word in words:
                    ids[token] = self.dictionary.word2idx[word]
                    token += 1

        return ids