Python Annotations.compute_confusion_matrix 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: medacy.tools

클래스/타입: Annotations

메소드/함수: compute_confusion_matrix

hotexamples.com에서의 예제들: 2

Python Annotations.compute_confusion_matrix - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 medacy.tools.Annotations.compute_confusion_matrix에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

Annotations(30)

get_entity_annotations(13)

compare_by_index(4)

diff(4)

add_entity(3)

compute_counts(2)

difference(2)

compute_confusion_matrix(2)

compute_ambiguity(2)

compare_by_index_stats(2)

to_html(2)

compare_by_entity(1)

get_entity_count(1)

get_labels(1)

get_spacy_entities(1)

intersection(1)

stats(1)

to_ann(1)

예제 #1

파일 보기

    def test_confusion_matrix(self):
        annotations1 = Annotations(join(self.dataset.get_data_directory(), self.ann_files[0]), annotation_type='ann')
        annotations2 = Annotations(join(self.dataset.get_data_directory(), self.ann_files[1]), annotation_type='ann')
        annotations1.add_entity(*annotations2.get_entity_annotations()[0])

        self.assertEqual(len(annotations1.compute_confusion_matrix(annotations2, self.entities)[0]), len(self.entities))
        self.assertEqual(len(annotations1.compute_confusion_matrix(annotations2, self.entities)), len(self.entities))

예제 #2

파일 보기

파일: dataset.py 프로젝트: yushu-liu/medaCy

    def compute_confusion_matrix(self, dataset, leniency=0):
        """
        Generates a confusion matrix where this Dataset serves as the gold standard annotations and `dataset` serves
        as the predicted annotations. A typical workflow would involve creating a Dataset object with the prediction directory
        outputted by a model and then passing it into this method.

        :param dataset: a Dataset object containing a predicted version of this dataset.
        :param leniency: a floating point value between [0,1] defining the leniency of the character spans to count as different. A value of zero considers only exact character matches while a positive value considers entities that differ by up to :code:`ceil(leniency * len(span)/2)` on either side.
        :return: two element tuple containing a label array (of entity names) and a matrix where rows are gold labels and columns are predicted labels. matrix[i][j] indicates that entities[i] in this dataset was predicted as entities[j] in 'annotation' matrix[i][j] times
        """
        if not isinstance(dataset, Dataset):
            raise ValueError("dataset must be instance of Dataset")

        #verify files are consistent
        diff = set(
            [file.ann_path.split(os.sep)[-1] for file in self]).difference(
                set([file.ann_path.split(os.sep)[-1] for file in dataset]))
        if diff:
            raise ValueError("Dataset of predictions is missing the files: " +
                             str(list(diff)))

        #sort entities in ascending order by count.
        entities = [
            key for key, _ in sorted(self.compute_counts()['entities'].items(),
                                     key=lambda x: x[1])
        ]
        confusion_matrix = [[0 for x in range(len(entities))]
                            for x in range(len(entities))]

        for gold_data_file in self:
            prediction_iter = iter(dataset)
            prediction_data_file = next(prediction_iter)
            while str(gold_data_file) != str(prediction_data_file):
                prediction_data_file = next(prediction_iter)

            gold_annotation = Annotations(gold_data_file.ann_path)
            pred_annotation = Annotations(prediction_data_file.ann_path)

            #compute matrix on the Annotation file level
            ann_confusion_matrix = gold_annotation.compute_confusion_matrix(
                pred_annotation, entities, leniency=leniency)
            for i in range(len(confusion_matrix)):
                for j in range(len(confusion_matrix)):
                    confusion_matrix[i][j] += ann_confusion_matrix[i][j]

        return entities, confusion_matrix