def polyfit_upperbound(dataset, degree): evaluator = Evaluator(dataset, '/tmp', degree) print('Predicting with upperbound...') for i, anno in enumerate(progressbar(dataset.annotations)): label = anno['label'] pred = np.zeros((label.shape[0], 1 + 2 + degree + 1)) pred[:, :3] = label[:, :3] for j, lane in enumerate(label): if lane[0] == 0: continue xy = lane[3:] x = xy[:(len(xy) // 2)] y = xy[(len(xy) // 2):] ind = x > 0 pred[j, -(degree + 1):] = np.polyfit(y[ind], x[ind], degree) evaluator.add_prediction([i], pred, 0.0005) # 0.0005 = dummy runtime _, result = evaluator.eval(label='upperbound', only_metrics=True) return result
batch_size=batch_size, shuffle=False, num_workers=8) evaluator = Evaluator(test_loader.dataset, exp_root) logging.basicConfig( format="[%(asctime)s] [%(levelname)s] %(message)s", level=logging.INFO, handlers=[ logging.FileHandler(os.path.join(exp_root, "test_log.txt")), logging.StreamHandler(), ], ) logging.info('Code state:\n {}'.format(get_code_state())) _, mean_loss = test(model, test_loader, evaluator, exp_root, cfg, epoch=test_epoch, view=False) logging.info("Mean test loss: {:.4f}".format(mean_loss)) evaluator.exp_name = args.exp_name eval_str, _ = evaluator.eval( label='{}_{}'.format(os.path.basename(args.exp_name), test_epoch)) logging.info(eval_str)
def train(model, train_loader, exp_dir, cfg, val_loader, train_state=None): # Get initial train state optimizer = cfg.get_optimizer(model.parameters()) scheduler = cfg.get_lr_scheduler(optimizer) starting_epoch = 1 if train_state is not None: model.load_state_dict(train_state['model']) optimizer.load_state_dict(train_state['optimizer']) scheduler.load_state_dict(train_state['lr_scheduler']) starting_epoch = train_state['epoch'] + 1 scheduler.step(starting_epoch) # Train the model criterion_parameters = cfg.get_loss_parameters() criterion = model.loss total_step = len(train_loader) ITER_LOG_INTERVAL = cfg['iter_log_interval'] ITER_TIME_WINDOW = cfg['iter_time_window'] MODEL_SAVE_INTERVAL = cfg['model_save_interval'] t0 = time() total_iter = 0 iter_times = [] logging.info("Starting training.") for epoch in range(starting_epoch, num_epochs + 1): epoch_t0 = time() logging.info("Beginning epoch {}".format(epoch)) accum_loss = 0 for i, (images, labels, img_idxs) in enumerate(train_loader): total_iter += 1 iter_t0 = time() images = images.to(device) labels = labels.to(device) # Forward pass outputs = model(images, epoch=epoch) loss, loss_dict_i = criterion(outputs, labels, **criterion_parameters) accum_loss += loss.item() # Backward and optimize optimizer.zero_grad() loss.backward() optimizer.step() iter_times.append(time() - iter_t0) if len(iter_times) > 100: iter_times = iter_times[-ITER_TIME_WINDOW:] if (i + 1) % ITER_LOG_INTERVAL == 0: loss_str = ', '.join([ '{}: {:.4f}'.format(loss_name, loss_dict_i[loss_name]) for loss_name in loss_dict_i ]) logging.info( "Epoch [{}/{}], Step [{}/{}], Loss: {:.4f} ({}), s/iter: {:.4f}, lr: {:.1e}" .format( epoch, num_epochs, i + 1, total_step, accum_loss / (i + 1), loss_str, np.mean(iter_times), optimizer.param_groups[0]["lr"], )) logging.info("Epoch time: {:.4f}".format(time() - epoch_t0)) if epoch % MODEL_SAVE_INTERVAL == 0 or epoch == num_epochs: model_path = os.path.join(exp_dir, "models", "model_{:03d}.pt".format(epoch)) save_train_state(model_path, model, optimizer, scheduler, epoch) if val_loader is not None: evaluator = Evaluator(val_loader.dataset, exp_root) evaluator, val_loss = test( model, val_loader, evaluator, None, cfg, view=False, epoch=-1, verbose=False, ) _, results = evaluator.eval(label=None, only_metrics=True) logging.info("Epoch [{}/{}], Val loss: {:.4f}".format( epoch, num_epochs, val_loss)) model.train() scheduler.step() logging.info("Training time: {:.4f}".format(time() - t0)) return model
def train(model, optimizer, train_loader, begin_step, epoch_begin, epoch_end, beta_func, filename_prefix, eval_loaders=[], use_gpu=False): assert 'Random' in train_loader.sampler.__class__.__name__ train_info = 'epoch:%d ~ %d learning rate:%8.6f' % (epoch_begin, epoch_end, optimizer.state_dict()['param_groups'][0]['lr']) model_prior_info = model.prior.__repr__() print(train_info) print(model_prior_info) logging = train_info + '\n' + model_prior_info + '\n' evaluator = Evaluator() criterion = nn.CrossEntropyLoss(size_average=True) n_data = len(train_loader.sampler) n_step = begin_step running_rate = 0.01 running_xent = 0.0 running_kld = 0.0 running_loss = 0.0 best_valid_acc = -float('inf') for e in range(epoch_begin, epoch_end): for b, data in enumerate(train_loader): # get the inputs inputs, outputs = data if use_gpu: inputs = inputs.cuda() outputs = outputs.cuda() # zero the parameter gradients optimizer.zero_grad() # forward + backward + gradient_correction + optimize # TODO : annealing kl-divergence term is better beta = beta_func(n_step) pred = model(inputs) xent = criterion(pred, outputs) kld = model.kl_divergence() / float(n_data) if torch.isinf(kld): raise RuntimeError("KL divergence is infinite. It is likely that ive is zero and is passed to log.") if torch.isnan(kld): model.kl_divergence() raise RuntimeError("KL divergence is Nan.") # kld = 0 loss = xent + kld * beta loss.backward() for m in model.modules(): if hasattr(m, 'gradient_correction'): m.gradient_correction(xent) optimizer.step() for m in model.modules(): if hasattr(m, 'parameter_adjustment'): m.parameter_adjustment() n_step += 1 # log_str = '%s [%6d steps in (%4d epochs) ] loss: %.6f, xentropy: %.6f, regularizer: %.6f, beta:%.3E, %s' % \ # (datetime.now().strftime("%H:%M:%S.%f"), n_step, e + 1, float(loss), float(xent), # float(kld), beta, train_info) # print(log_str) # print statistics running_xent = running_rate * float(xent) + (1 - running_rate) * running_xent running_kld = running_rate * float(kld) + (1 - running_rate) * running_kld running_loss = running_rate * float(loss) + (1 - running_rate) * running_loss train_acc, valid_acc, test_acc, eval_str, test_kld, test_nll = evaluate(model, eval_loaders[0], eval_loaders[1], eval_loaders[2]) evaluator.eval(train_acc = train_acc, test_acc= test_acc, train_nll=running_xent, test_nll= test_nll, train_kld = running_kld, test_kld = test_kld) log_str = '%s [%6d steps in (%4d epochs) ] loss: %.6f, train_xentropy: %.6f, train_kld: %.6f, beta:%.3E, test_xentropy: %.6f, text_kld:%.6f, %s' % \ (datetime.now().strftime("%H:%M:%S.%f"), n_step, e + 1, running_loss, running_xent, running_kld, beta, test_nll, test_kld, train_info) print(log_str) logging += log_str + '\n' running_xent = 0.0 running_kld = 0.0 running_loss = 0.0 if valid_acc > best_valid_acc or epoch_end - e <= 20: print('Best validation accuracy has been updated at %4d epoch.' % (e + 1)) torch.save(model.state_dict(), MODEL_FILENAME(filename_prefix + '_e' + str(e + 1).zfill(4))) torch.save(optimizer.state_dict(), OPTIM_FILENAME(filename_prefix + '_e' + str(e + 1).zfill(4))) best_valid_acc = valid_acc logging += eval_str + '\n' print('Last update is stored.') print(os.path.join(SAVE_DIR, MODEL_FILENAME(filename_prefix))) torch.save(model.state_dict(), MODEL_FILENAME(filename_prefix)) torch.save(optimizer.state_dict(), OPTIM_FILENAME(filename_prefix)) evaluator.plot(FIG_SAVE_DIR) logging += train_info + '\n' + model_prior_info return logging, n_step