Python preprocess_tokens 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: DataClass.data_utils

메소드/함수: preprocess_tokens

hotexamples.com에서의 예제들: 3

Python preprocess_tokens - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 DataClass.data_utils.preprocess_tokens에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: torchData.py 프로젝트: dasalz1/Neural_Code_Generator

    def __getitem__(self, idx):
        try:
            x = next(
                pd.read_csv(self.filename,
                            skiprows=idx * self.chunksize + 1,
                            chunksize=self.chunksize,
                            header=None,
                            dtype=str)).fillna(NO_CONTEXT_WORD).values

            # something is broken here so just give filler
            if len(x[0]) != self.num_cols:
                # idx = max(0, idx-1)
                return self.__getitem__(np.random.randint(0, self.len))
        except:
            x = next(
                pd.read_csv(self.filename,
                            skiprows=idx * self.chunksize + 1,
                            chunksize=self.chunksize,
                            header=None,
                            sep=',\s+',
                            quoting=csv.QUOTE_ALL,
                            dtype=str)).fillna(NO_CONTEXT_WORD).values

            x = np.array(fix_quote_strings(x[0, 0]))

        x_tokens = preprocess_tokens(tokenize_fine_grained(x[0, 0]),
                                     self.max_dim)
        y_tokens = preprocess_tokens(tokenize_fine_grained(x[0, 1]),
                                     self.max_dim)

        # x_tokens = [word2idx.get(token, UNKNOWN_IDX) for token in x_tokens]
        # y_tokens = [word2idx.get(token, UNKNOWN_IDX) for token in y_tokens]

        return x_tokens, y_tokens

예제 #2

파일 보기

파일: RuleMetaTorchData.py 프로젝트: dasalz1/Neural_Code_Generator

    def words2tokens(self, x):
        x_tokens = preprocess_context(
            x, self.n_retrieved,
            self.max_dim) if self.retrieve_context else preprocess_tokens(
                tokenize_fine_grained(x[0, 0]), self.max_dim)
        y_tokens = preprocess_tokens(tokenize_fine_grained(x[0, 1]),
                                     self.max_dim)

        x_tokens = [word2idx.get(token, UNKNOWN_IDX) for token in x_tokens]
        y_tokens = [word2idx.get(token, UNKNOWN_IDX) for token in y_tokens]

        return x_tokens, y_tokens

예제 #3

파일 보기

파일: MetaTorchData.py 프로젝트: dasalz1/Neural_Code_Generator

    def __getitem__(self, idx):
        try:
            x = self.read_pandas_line(idx)

            # something is broken here so just give filler
            if len(x[0]) != self.num_cols:
                idx = max(0, idx - 1)
                return self.__getitem__(self.len - 1 if idx == 0 else idx)
        except:
            x = self.read_pandas_line_quote(idx)

            x = np.array(fix_quote_strings_context(x[0, 0], self.n_retrieved))

        query_x = [
            word2idx.get(token, UNKNOWN_IDX) for token in preprocess_tokens(
                tokenize_fine_grained(x[0, 0]), self.max_dim)
        ]

        support_list_x = []
        support_list_y = []
        for i in range(self.n_retrieved):
            support_list_x.append([
                word2idx.get(token, UNKNOWN_IDX)
                for token in preprocess_tokens(
                    tokenize_fine_grained(x[0, i * 2 + 1]), self.max_dim)
            ])
            support_list_y.append([
                word2idx.get(token, UNKNOWN_IDX)
                for token in preprocess_tokens(
                    tokenize_fine_grained(x[0, i * 2 + 2]), self.max_dim)
            ])

        query_y = [
            word2idx.get(token, UNKNOWN_IDX) for token in preprocess_tokens(
                tokenize_fine_grained(x[0, -1]), self.max_dim)
        ]

        support_x = torch.LongTensor(
            pd.DataFrame(support_x).values.astype('int64'))
        support_y = torch.LongTensor(
            pd.DataFrame(support_y).values.astype('int64'))

        query_x = torch.LongTensor(
            pd.DataFrame(query_x).values.astype('int64')).contiguous().view(
                1, -1)
        query_y = torch.LongTensor(
            pd.DataFrame(query_y).values.astype('int64')).contiguous().view(
                1, -1)

        return support_x, support_y, query_x, query_y