Python QueryUtils.static_simple_remove_punct 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: query_util

클래스/타입: QueryUtils

메소드/함수: static_simple_remove_punct

hotexamples.com에서의 예제들: 2

Python QueryUtils.static_simple_remove_punct - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 query_util.QueryUtils.static_simple_remove_punct에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

static_jieba_cut(8)

QueryUtils(7)

static_remove_cn_punct(4)

static_remove_pu(3)

corenlp_cut(2)

remove_cn_punct(2)

static_simple_remove_punct(2)

static_corenlp_cut(1)

예제 #1

파일 보기

    def _prepare_data(self, files):
        print('prepare data...')

        embeddings = list()
        queries = list()
        labels = list()
        # mlb = MultiLabelBinarizer()

        for index in xrange(len(files)):
            path = files[index]
            with open(path, 'r') as f:
                for line in f:
                    # line = json.loads(line.strip().decode('utf-8'))
                    # question = line['question']
                    line = line.replace('\t', '').replace(
                        ' ', '').strip('\n').decode('utf-8').split('#')
                    question = QueryUtils.static_simple_remove_punct(
                        str(line[0]))
                    label = self.named_labels.index(
                        str(line[1].encode('utf-8')))
                    queries.append(question)
                    labels.append(label)
                    tokens = [self.cut(question)]
                    embedding = self.feature_extractor.transform(
                        tokens).toarray()
                    embeddings.append(embedding)

        embeddings = np.array(embeddings)
        embeddings = np.squeeze(embeddings)
        # self.kernel.fit()
        # self.mlb = mlb.fit(labels)
        # labels = self.mlb.transform(labels)

        # print (embeddings.shape, len(queries))
        # print_cn(labels.shape)

        return embeddings, labels, queries

예제 #2

파일 보기

 def cut(self, input_):
     input_ = QueryUtils.static_simple_remove_punct(input_)
     seg = " ".join(jieba.cut(input_, cut_all=False))
     tokens = _uniout.unescape(str(seg), 'utf8')
     return tokens