Python Corpus 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: corpus_analysis.corpus

클래스/타입: Corpus

hotexamples.com에서의 예제들: 5

Python Corpus - 5개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 corpus_analysis.corpus.Corpus에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

Corpus(5)

자주 사용되는 메소드들

Corpus (5)

예제 #1

파일 보기

파일: corpus_test.py 프로젝트: dhmit/gender_analysis

    def test_load_pickle(self, tmp_path):
        """
        Tests that the corpus can properly load from a pickle file, while retaining
        all of the relevant information

        :param tmp_path: a temporary directory created by pytest that will be used to store
            a pickle file from the test
        """

        pickle_path = tmp_path / 'pickle.pgz'

        original_corpus = Corpus(common.TEST_CORPUS_PATH,
                                 csv_path=common.SMALL_TEST_CORPUS_CSV,
                                 name='test_corpus',
                                 pickle_on_load=pickle_path,
                                 ignore_warnings=True)

        # first make sure the small corpus is correct
        assert len(original_corpus) == 10
        assert type(original_corpus.documents) == list
        assert original_corpus.name == 'test_corpus'

        # next load the pickle file to make sure data was copied correctly
        pickle_corpus = Corpus(pickle_path, name='test_corpus')
        assert len(pickle_corpus) == 10
        assert type(original_corpus.documents) == list
        assert pickle_corpus.name == 'test_corpus'

        # Make sure the corpora are equal
        assert original_corpus == pickle_corpus

예제 #2

파일 보기

파일: metadata_visualizations_test.py 프로젝트: dhmit/gender_analysis

    def test_plot_gender_breakdown_different_file_constructions(self):
        c = Corpus(
            common.TEST_CORPUS_PATH,
            csv_path=common.LARGE_TEST_CORPUS_CSV,
            name='test_corpus',
        )

        default_save_name = 'gender_breakdown_for_' + c.name.replace(
            ' ', '_') + '.png'
        test_file_1_name = "testing_file1.png"

        default_save_path = OUTPUT_DIRECTORY_PATH / default_save_name
        test_file_save_path = OUTPUT_DIRECTORY_PATH / test_file_1_name

        test_file_paths = []

        plot_gender_breakdown(c, OUTPUT_DIRECTORY_PATH)
        assert Path.is_file(default_save_path)
        test_file_paths.append(default_save_path)

        plot_gender_breakdown(c, OUTPUT_DIRECTORY_PATH, "testing file1")
        assert Path.is_file(test_file_save_path)
        test_file_paths.append(test_file_save_path)

        for file_created1 in test_file_paths:
            for file_created2 in test_file_paths:
                assert filecmp.cmp(file_created1, file_created2)

        for file_created in test_file_paths:
            Path.unlink(file_created)

예제 #3

파일 보기

파일: corpus_test.py 프로젝트: dhmit/gender_analysis

    def test_load_without_csv(self):
        """
        Tests that the corpus properly loads when not provided metadata
        """

        c = Corpus(common.TEST_CORPUS_PATH)
        assert len(c) == 99
        assert type(c.documents) == list
        assert c.name is None

예제 #4

파일 보기

파일: corpus_test.py 프로젝트: dhmit/gender_analysis

    def test_load_with_csv(self):
        """
        Test that the corpus properly loads when provided a metadata csv
        """

        c = Corpus(
            common.TEST_CORPUS_PATH,
            csv_path=common.LARGE_TEST_CORPUS_CSV,
            name='test_corpus',
        )
        assert len(c) == 99
        assert type(c.documents) == list
        assert c.name == 'test_corpus'

예제 #5

파일 보기

파일: metadata_visualizations_test.py 프로젝트: dhmit/gender_analysis

    def test_create_all_visualizations_but_with_no_corpus_name(self):
        c = Corpus(common.TEST_CORPUS_PATH,
                   csv_path=common.LARGE_TEST_CORPUS_CSV)

        default_gender_breakdown = 'gender_breakdown_for_corpus.png'
        default_metadata_pie = 'percentage_acquired_metadata_for_corpus.png'
        default_country_pub = 'country_of_pub_for_corpus.png'
        default_pub_date = 'date_of_pub_for_corpus.png'

        create_corpus_summary_visualizations(c, OUTPUT_DIRECTORY_PATH)
        assert Path.is_file(OUTPUT_DIRECTORY_PATH / default_gender_breakdown)
        assert Path.is_file(OUTPUT_DIRECTORY_PATH / default_pub_date)
        assert Path.is_file(OUTPUT_DIRECTORY_PATH / default_country_pub)
        assert Path.is_file(OUTPUT_DIRECTORY_PATH / default_metadata_pie)