Python trim_batch 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: transformers.tokenization_utils

메소드/함수: trim_batch

hotexamples.com에서의 예제들: 5

Python trim_batch - 5개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 transformers.tokenization_utils.trim_batch에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

 def trim_seq2seq_batch(batch, pad_token_id):
     target_ids = trim_batch(batch["target_ids"], pad_token_id)
     source_ids, source_mask = trim_batch(
         batch["source_ids"],
         pad_token_id,
         attention_mask=batch["source_mask"])
     return source_ids, source_mask, target_ids

예제 #2

파일 보기

 def trim_seq2seq_batch(batch, pad_token_id, test=False):
     # Remove columns that are populated exclusively by pad_token_id
     # This ensures that each batch is padded only uptil the "max sequence length"
     # https://github.com/huggingface/transformers/blob/1e51bb717c04ca4b01a05a7a548e6b550be38628/src/transformers/tokenization_utils.py
     source_ids, source_mask = trim_batch(
         batch["source_ids"],
         pad_token_id,
         attention_mask=batch["source_mask"])
     if test:
         return source_ids, source_mask, None
     y = trim_batch(batch["target_ids"], pad_token_id)
     return source_ids, source_mask, y

예제 #3

파일 보기

파일: utils.py 프로젝트: IBM/LongAnswer

 def collate_fn(self, batch):
     rob_emb = []
     rob_emb_new = []
     max_seq_rob = -1  
     input_ids = torch.stack([x["source_ids"] for x in batch])
     masks = torch.stack([x["source_mask"] for x in batch])
     target_ids = torch.stack([x["target_ids"] for x in batch])
     roberta_embeddings = torch.stack([x["roberta"] for x in batch])#torch.stack(rob_emb_new)
     pad_token_id = self.tokenizer.pad_token_id
     y = trim_batch(target_ids, pad_token_id)
     source_ids, source_mask = trim_batch(input_ids, pad_token_id, attention_mask=masks)
     return {"source_ids": source_ids, "source_mask": source_mask, "target_ids": y,"roberta_embeddings": roberta_embeddings}

예제 #4

파일 보기

파일: utils.py 프로젝트: RyanDsilva/bart-finetuning-ami

 def collate_fn(self, batch):
     input_ids = torch.stack([x["source_ids"] for x in batch])
     masks = torch.stack([x["source_mask"] for x in batch])
     target_ids = torch.stack([x["target_ids"] for x in batch])
     pad_token_id = self.tokenizer.pad_token_id
     y = trim_batch(target_ids, pad_token_id)
     source_ids, source_mask = trim_batch(input_ids,
                                          pad_token_id,
                                          attention_mask=masks)
     return {
         "source_ids": source_ids,
         "source_mask": source_mask,
         "target_ids": y
     }

예제 #5

파일 보기

    def collate_fn(self, batch):
        """
    The tensors are stacked together as they are yielded.

    Collate function is applied to the output of a DataLoader as it is yielded.
    """
        input_ids = torch.stack([x["source_ids"] for x in batch])  # BS x SL
        masks = torch.stack([x["source_mask"] for x in batch])  # BS x SL
        pad_token_id = self.tokenizer.pad_token_id
        source_ids, source_mask = trim_batch(input_ids,
                                             pad_token_id,
                                             attention_mask=masks)
        if self.type_path == "test":
            return {"source_ids": source_ids, "source_mask": source_mask}

        target_ids = torch.stack([x["target_ids"] for x in batch])  # BS x SL
        # Remove columns that are purely padding
        y = trim_batch(target_ids, pad_token_id)
        # Return dictionary containing tensors
        return {
            "source_ids": source_ids,
            "source_mask": source_mask,
            "target_ids": y
        }