Python str2regex 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: kbclean.utils.data.helpers

메소드/함수: str2regex

hotexamples.com에서의 예제들: 11

Python str2regex - 11개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 kbclean.utils.data.helpers.str2regex에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

    def transform(self, dirty_df: pd.DataFrame, col: str):
        char_features = self.char_counter.transform(
            dirty_df[col].values.tolist()).todense()
        # word_features = self.word_counter.transform(
        #     dataset.dirty_df[col].values.tolist()
        # ).todense()
        regex_features = self.regex_counter.transform([
            str2regex(val, match_whole_token=False)
            for val in dirty_df[col].values
        ]).todense()

        regex_features2 = self.regex_counter2.transform([
            str2regex(val, match_whole_token=True)
            for val in dirty_df[col].values
        ]).todense()

        # word_features =  self.word_counter.transform(
        #     dirty_df[col].values
        # ).todense()

        return [
            torch.tensor(
                np.concatenate(
                    [char_features, regex_features, regex_features2], axis=1))
        ]

예제 #2

파일 보기

 def fit(self, dirty_df: pd.DataFrame, col: str):
     self.char_counter.fit(dirty_df[col].values.tolist())
     self.regex_counter.fit([
         str2regex(val, match_whole_token=False)
         for val in dirty_df[col].values
     ])
     self.regex_counter2.fit([
         str2regex(val, match_whole_token=True)
         for val in dirty_df[col].values
     ])

예제 #3

파일 보기

파일: query.py 프로젝트: minhptx/spade

    def get_coexist_counts(self, values):
        set_values = set(values)
        query = "{}\n" + "\n{}\n".join(
            [
                json.dumps(
                    {
                        "query": {
                            "term": {
                                "data": {
                                    "value": str2regex(val, match_whole_token=True)
                                }
                            }
                        }
                    }
                )
                for val in set_values
            ]
        )
        mresult = self.es.msearch(query, index="n_reversed_indices")

        indices_list = [ESQuery.get_results(res, "idx") for res in mresult["responses"]]

        coexist_count = defaultdict(lambda: {})

        for idx1, val1 in enumerate(values):
            for idx2, val2 in enumerate(values):
                if indices_list[idx1] is None or indices_list[idx2] is None:
                    coexist_count[val1][val2] = 0
                else:
                    coexist_count[val1][val2] = set(indices_list[idx1]).intersection(
                        indices_list[idx2]
                    )

        return coexist_count

예제 #4

파일 보기

    def transform(self, dirty_df: pd.DataFrame, col: str):
        tfidf = self.tfidf.transform(dirty_df[col].values.tolist()).todense()

        sym_tfidf = self.sym_tfidf.transform(dirty_df[col].apply(
            lambda x: str2regex(x, match_whole_token=False)).values).todense()

        return [torch.tensor(np.concatenate([tfidf], axis=1))]

예제 #5

파일 보기

파일: holo.py 프로젝트: minhptx/spade

    def fit(self, values):
        trigram = [["".join(x) for x in list(xngrams(val, 3))]
                   for val in values]
        ngrams = list(itertools.chain.from_iterable(trigram))
        self.trigram_counter = Counter(ngrams)
        sym_ngrams = [str2regex(x, False) for x in ngrams]

        self.sym_trigram_counter = Counter(sym_ngrams)
        self.val_counter = Counter(values)

        sym_values = [str2regex(x, False) for x in values]
        self.sym_val_counter = Counter(sym_values)

        self.func2counter = {
            val_trigrams: self.trigram_counter,
            sym_trigrams: self.sym_trigram_counter,
            value_freq: self.val_counter,
            sym_value_freq: self.sym_val_counter,
        }

예제 #6

파일 보기

def sym_value_freq(values, counter):
    patterns = list(map(lambda x: str2regex(x, True), values))

    return value_freq(patterns, counter)

예제 #7

파일 보기

def sym_trigrams(values, counter):
    patterns = list(map(lambda x: str2regex(x, False), values))
    return val_trigrams(patterns, counter)

예제 #8

파일 보기

파일: format.py 프로젝트: minhptx/spade

 def transform(self, dirty_df: pd.DataFrame, col):
     return (dirty_df[col].swifter.apply(lambda x: self.counter[str2regex(
         x, match_whole_token=True)] / len(dirty_df)).values)

예제 #9

파일 보기

파일: format.py 프로젝트: minhptx/spade

 def fit(self, dirty_df: pd.DataFrame, col):
     self.counter = (dirty_df[col].swifter.apply(lambda x: str2regex(
         x, match_whole_token=True)).value_counts().to_dict())

예제 #10

파일 보기

 def fit(self, dirty_df: pd.DataFrame, col: str):
     self.tfidf.fit(dirty_df[col].values.tolist())
     self.sym_tfidf.fit(dirty_df[col].apply(
         lambda x: str2regex(x, match_whole_token=False)).values)

예제 #11

파일 보기

def clean_str(x):
    x = x.strip().encode("ascii", "ignore").decode("ascii")
    return str2regex(x, True)