Python TextFileStreamer 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: rosetta.text.streamers

클래스/타입: TextFileStreamer

hotexamples.com에서의 예제들: 12

Python TextFileStreamer - 12개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 rosetta.text.streamers.TextFileStreamer에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

TextFileStreamer(5)

to_scipysparse(2)

info_stream(1)

to_vw(1)

token_stream(1)

예제 #1

파일 보기

파일: test_streamer.py 프로젝트: rjweiss/rosetta

    def test_to_vw(self):
        stream = TextFileStreamer(path_list=[self.doc1, self.doc2], tokenizer=self.tokenizer)
        result = StringIO()
        stream.to_vw(result)

        benchmark = " 1 doc1| failure:1 doomed:1\n 1 doc2| set:1 success:1\n"
        self.assertEqual(benchmark, result.getvalue())

예제 #2

파일 보기

파일: test_streamer.py 프로젝트: vickingur/rosetta

    def test_to_vw(self):
        stream = TextFileStreamer(path_list=[self.doc1, self.doc2],
                                  tokenizer=self.tokenizer)
        result = StringIO()
        stream.to_vw(result)

        benchmark = " 1 doc1| failure:1 doomed:1\n 1 doc2| set:1 success:1\n"
        self.assertEqual(benchmark, result.getvalue())

예제 #3

파일 보기

파일: test_streamer.py 프로젝트: rjweiss/rosetta

    def test_to_scipyspare(self):
        stream = TextFileStreamer(path_list=[self.doc1, self.doc2], tokenizer=self.tokenizer)

        result = stream.to_scipysparse()
        benchmark = sparse.csr_matrix([[1, 1, 0, 0], [0, 0, 1, 1]])

        compare = result.toarray() == benchmark.toarray()
        self.assertTrue(compare.all())

예제 #4

파일 보기

파일: test_streamer.py 프로젝트: vickingur/rosetta

    def test_to_scipyspare(self):
        stream = TextFileStreamer(path_list=[self.doc1, self.doc2],
                                  tokenizer=self.tokenizer)

        result = stream.to_scipysparse()
        benchmark = sparse.csr_matrix([[1, 1, 0, 0], [0, 0, 1, 1]])

        compare = result.toarray() == benchmark.toarray()
        self.assertTrue(compare.all())

예제 #5

파일 보기

파일: test_streamer.py 프로젝트: rjweiss/rosetta

    def test_token_stream(self):
        stream = TextFileStreamer(path_list=[self.doc1, self.doc2], tokenizer=self.tokenizer)
        token_benchmark = [["doomed", "failure"], ["set", "success"]]
        id_benchmark = ["doc1", "doc2"]
        token_result = []
        for each in stream.token_stream(cache_list=["doc_id"]):
            token_result.append(each)

        self.assertEqual(token_benchmark, token_result)
        self.assertEqual(id_benchmark, stream.__dict__["doc_id_cache"])

예제 #6

파일 보기

파일: test_streamer.py 프로젝트: vickingur/rosetta

    def test_token_stream(self):
        stream = TextFileStreamer(path_list=[self.doc1, self.doc2],
                                  tokenizer=self.tokenizer)
        token_benchmark = [['doomed', 'failure'], ['set', 'success']]
        id_benchmark = ['doc1', 'doc2']
        token_result = []
        for each in stream.token_stream(cache_list=['doc_id']):
            token_result.append(each)

        self.assertEqual(token_benchmark, token_result)
        self.assertEqual(id_benchmark, stream.__dict__['doc_id_cache'])

예제 #7

파일 보기

파일: test_streamer.py 프로젝트: ANB2/rosetta

    def test_token_stream(self):
        stream = TextFileStreamer(path_list = [self.doc1, self.doc2],
                                  tokenizer=self.tokenizer)
        token_benchmark = [['doomed', 'failure'],
                           ['set', 'success']]
        id_benchmark = ['doc1', 'doc2']
        token_result = []
        for each in stream.token_stream(cache_list=['doc_id']):
            token_result.append(each)

        self.assertEqual(token_benchmark, token_result)
        self.assertEqual(id_benchmark, stream.__dict__['doc_id_cache'])

예제 #8

파일 보기

파일: test_streamer.py 프로젝트: rjweiss/rosetta

    def test_info_stream(self):
        stream = TextFileStreamer(path_list=[self.doc1, self.doc2], tokenizer=self.tokenizer)
        token_benchmark = [["doomed", "failure"], ["set", "success"]]
        text_benchmark = ["doomed to failure\n", "set for success\n"]

        token_result = []
        text_result = []
        for each in stream.info_stream():
            token_result.append(each["tokens"])
            text_result.append(each["text"])

        self.assertEqual(token_benchmark, token_result)
        self.assertEqual(text_benchmark, text_result)

예제 #9

파일 보기

파일: test_streamer.py 프로젝트: vickingur/rosetta

    def test_info_stream(self):
        stream = TextFileStreamer(path_list=[self.doc1, self.doc2],
                                  tokenizer=self.tokenizer)
        token_benchmark = [['doomed', 'failure'], ['set', 'success']]
        text_benchmark = ['doomed to failure\n', 'set for success\n']

        token_result = []
        text_result = []
        for each in stream.info_stream():
            token_result.append(each['tokens'])
            text_result.append(each['text'])

        self.assertEqual(token_benchmark, token_result)
        self.assertEqual(text_benchmark, text_result)

예제 #10

파일 보기

파일: test_streamer.py 프로젝트: ANB2/rosetta

    def test_info_stream(self):
        stream = TextFileStreamer(path_list = [self.doc1, self.doc2],
                                  tokenizer=self.tokenizer)
        token_benchmark = [['doomed', 'failure'],
                           ['set', 'success']]
        text_benchmark = ['doomed to failure\n', 'set for success\n']

        token_result = []
        text_result = []
        for each in stream.info_stream():
            token_result.append(each['tokens'])
            text_result.append(each['text'])

        self.assertEqual(token_benchmark, token_result)
        self.assertEqual(text_benchmark, text_result)

예제 #11

파일 보기

파일: test_streamer.py 프로젝트: ANB2/rosetta

    def test_to_scipyspare(self):
        stream = TextFileStreamer(path_list = [self.doc1, self.doc2],
                                  tokenizer=self.tokenizer)

        result = stream.to_scipysparse()
        benchmark = sparse.csr_matrix([[1, 1, 0, 0], [0, 0, 1, 1]])

예제 #12

파일 보기

파일: test_streamer.py 프로젝트: vickingur/rosetta

    def test_to_scipyspare(self):
        stream = TextFileStreamer(path_list=[self.doc1, self.doc2],
                                  tokenizer=self.tokenizer)

        result = stream.to_scipysparse()
        benchmark = sparse.csr_matrix([[1, 1, 0, 0], [0, 0, 1, 1]])