Python JClass.segment示例

编程语言: Python

命名空间/包名称: pyhanlp

类/类型: JClass

方法/功能: segment

hotexamples.com的示例: 3

Python JClass.segment - 已找到3个示例。这些是从开源项目中提取的最受好评的pyhanlp.JClass.segment现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

JClass(10)

segment(3)

add(1)

evaluate(1)

insert(1)

parseDependency(1)

示例#1

显示文件

文件： harvesttext.py 项目： jieliorz/HarvestText

 def named_entity_recognition(self, sent, standard_name=False):
     """
     利用pyhanlp的命名实体识别，找到句子中的（人名，地名，机构名）三种实体。harvesttext会预先链接已知实体
     :param sent:
     :param standard_name:
     :return: 发现的命名实体信息，字典 {实体名: 实体类型}
     """
     from pyhanlp import HanLP, JClass
     if not self.hanlp_prepared:
         self.hanlp_prepare()
     self.standard_name = standard_name
     entities_info = self.entity_linking(sent)
     sent2 = self.decoref(sent, entities_info)
     StandardTokenizer = JClass("com.hankcs.hanlp.tokenizer.StandardTokenizer")
     StandardTokenizer.SEGMENT.enableAllNamedEntityRecognize(True)
     entity_type_dict = {}
     try:
         for x in StandardTokenizer.segment(sent2):
             # 三种前缀代表：人名（nr），地名（ns），机构名（nt）
             tag0 = str(x.nature)
             if tag0.startswith("nr"):
                 entity_type_dict[x.word] = "人名"
             elif tag0.startswith("ns"):
                 entity_type_dict[x.word] = "地名"
             elif tag0.startswith("nt"):
                 entity_type_dict[x.word] = "机构名"
             elif tag0.startswith("nz"):
                 entity_type_dict[x.word] = "其他专名"
     except:
         pass
     return entity_type_dict

示例#2

显示文件

    def named_entity_recognition(self, sent, standard_name=False, return_posseg=False):
        '''利用pyhanlp的命名实体识别，找到句子中的（人名，地名，机构名，其他专名）实体。harvesttext会预先链接已知实体

        :param sent: string, 文本
        :param standard_name: bool, 是否把连接到的已登录转化为标准名
        :param return_posseg: bool, 是否返回包括命名实体识别的，带词性分词结果
        :param book: bool, 预先识别
        :return: entity_type_dict: 发现的命名实体信息，字典 {实体名: 实体类型}
            (return_posseg=True时) possegs: list of (单词, 词性)
        '''
        from pyhanlp import HanLP, JClass
        if not self.hanlp_prepared:
            self.hanlp_prepare()
        self.standard_name = standard_name
        entities_info = self.entity_linking(sent)
        sent2 = self.decoref(sent, entities_info)
        StandardTokenizer = JClass("com.hankcs.hanlp.tokenizer.StandardTokenizer")
        StandardTokenizer.SEGMENT.enableAllNamedEntityRecognize(True)
        entity_type_dict = {}
        try:
            possegs = []
            for x in StandardTokenizer.segment(sent2):
                # 三种前缀代表：人名（nr），地名（ns），机构名（nt）
                tag0 = str(x.nature)
                if tag0.startswith("nr"):
                    entity_type_dict[x.word] = "人名"
                elif tag0.startswith("ns"):
                    entity_type_dict[x.word] = "地名"
                elif tag0.startswith("nt"):
                    entity_type_dict[x.word] = "机构名"
                elif tag0.startswith("nz"):
                    entity_type_dict[x.word] = "其他专名"
                possegs.append((x.word, tag0))
        except:
            pass
        if return_posseg:
            return entity_type_dict, possegs
        else:
            return entity_type_dict

示例#3

显示文件

文件： __init__.py 项目： ayiis/coding

def hanlp_cut(text):
    tokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer")
    return " ".join([term.word for term in tokenizer.segment(text)])