Python VisionTransformer.eval 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: models.modeling

클래스/타입: VisionTransformer

메소드/함수: eval

hotexamples.com에서의 예제들: 2

Python VisionTransformer.eval - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 models.modeling.VisionTransformer.eval에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

VisionTransformer(8)

load_from(5)

to(5)

load_state_dict(4)

eval(2)

epoch(2)

evaluate(1)

epochs(1)

load_pretrained(1)

cuda(1)

min_val(1)

parameters(1)

session_name(1)

state_dict(1)

train_epoch(1)

val_epoch(1)

visualize_graph(1)

예제 #1

파일 보기

def visualize_attn(args):
    """ 
    Visualization of learned attention map.
    """
    config = CONFIGS[args.model_type]
    num_classes = 10 if args.dataset == "cifar10" else 100

    model = VisionTransformer(config, args.img_size, 
                            norm_type=args.norm_type, 
                            zero_head=True, 
                            num_classes=num_classes,
                            vis=True)

    ckpt_file = os.path.join(args.output_dir, args.name + "_checkpoint.bin")
    ckpt = torch.load(ckpt_file)
    # use single card for visualize attn map
    model.load_state_dict(ckpt)
    model.to(args.device)
    model.eval()
    
    _, test_loader = get_loader(args)
    sample_idx = 0
    layer_ids = [0, 3, 6, 9]
    head_id = 0
    with torch.no_grad():
        for step, batch in enumerate(test_loader):
            batch = tuple(t.to(args.device) for t in batch)
            x, y = batch
            select_x = x[sample_idx].unsqueeze(0)
            output, attn_weights = model(select_x)
            # attn_weights is List[(1, number_of_head, len_h, len_h)]
            for layer_id in layer_ids:
                vis_attn(args, attn_weights[layer_id].squeeze(0)[head_id], layer_id=layer_id)
            break # visualize the first sample in the first batch
    print("done.")
    exit(0)

예제 #2

파일 보기

파일: visualize_attention_map.py 프로젝트: city292/ViT-pytorch

imagenet_labels = dict(
    enumerate(open('attention_data/ilsvrc2012_wordnet_lemmas.txt')))

img_url = "https://images.mypetlife.co.kr/content/uploads/2019/04/09192811/welsh-corgi-1581119_960_720.jpg"
urlretrieve(img_url, "attention_data/img.jpg")

# Prepare Model
config = CONFIGS["ViT-B_16"]
model = VisionTransformer(config,
                          num_classes=1000,
                          zero_head=False,
                          img_size=224,
                          vis=True)
model.load_from(np.load("attention_data/ViT-B_16-224.npz"))
model.eval()

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
])
im = Image.open("attention_data/img.jpg")
x = transform(im)
x.size()
# %%
logits, att_mat = model(x.unsqueeze(0))

att_mat = torch.stack(att_mat).squeeze(1)

# Average the attention weights across all heads.