Python Wikipedia.texts 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: textacy.datasets.wikipedia

클래스/타입: Wikipedia

메소드/함수: texts

hotexamples.com에서의 예제들: 3

Python Wikipedia.texts - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 textacy.datasets.wikipedia.Wikipedia.texts에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

Wikipedia(6)

texts(3)

download(2)

자주 사용되는 메소드들

Wikipedia (6)

texts (3)

download (2)

예제 #1

파일 보기

파일: clean.py 프로젝트: richiefrost/lda-frost

        if not token.is_ascii:
            continue
        if token.pos_ in {u'NOUN', u'PROPN'}:
            words.append(token.lemma_)

    return words


pool_size = 32

p = Pool(pool_size)

wp = Wikipedia(lang='en', version='latest')

with open("lemmatized_nouns/output.txt", "w+") as f:
    batch, batch_max = [], 2**14

    for text in wp.texts(min_len=300):
        batch.append(text)
        if len(batch) >= batch_max:
            # Returns pool_size number of arrays of roughly (batch_max / pool_size) processed documents (each document represented in array form)
            results = p.map(process_mini_batch,
                            (batch[i::pool_size] for i in range(pool_size)))
            for result in results:
                for entry in result:
                    # Write each document on its own line
                    f.write(' '.join([word.encode('utf-8')
                                      for word in entry]) + "\n")

            batch = []

예제 #2

파일 보기

 def test_ioerror(self):
     dataset = Wikipedia(data_dir=self.tempdir)
     with self.assertRaises(IOError):
         _ = list(dataset.texts())

예제 #3

파일 보기

파일: test_dataset_wikipedia.py 프로젝트: wangziyi2016/textacy

def test_ioerror(tmpdir):
    dataset = Wikipedia(data_dir=str(tmpdir))
    with pytest.raises(IOError):
        _ = list(dataset.texts())