Python Batch.advantages Exemples

Langage de programmation: Python

Espace de nommage/Pack: tianshou.data

Class/Type: Batch

Méthode/Fonction: advantages

Exemples au hotexamples.com: 1

Python Batch.advantages - 1 exemples trouvés. Ce sont les exemples réels les mieux notés de tianshou.data.Batch.advantages extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

Batch(30)

split(30)

weight(28)

pop(23)

returns(17)

stack(14)

update(11)

cat(9)

rew(9)

obs(8)

get(7)

act(7)

to_torch(6)

logp_old(6)

done(6)

cat_(6)

append(5)

adv(5)

is_empty(5)

keys(3)

to_numpy(3)

items(3)

obs_next(2)

update_weight(2)

empty_(2)

empty(2)

cat_list(2)

v_s(2)

v(2)

b(2)

values(1)

value_targets(1)

advantages(1)

loss(1)

policy(1)

stack_(1)

__repr__(1)

info(1)

indice(1)

Méthodes fréquemment utilisées

Batch (30)

split (30)

weight (28)

pop (23)

returns (17)

stack (14)

update (11)

cat (9)

rew (9)

obs (8)

Méthodes fréquemment utilisées

get (7)

act (7)

to_torch (6)

logp_old (6)

done (6)

cat_ (6)

append (5)

adv (5)

is_empty (5)

keys (3)

to_numpy (3)

items (3)

obs_next (2)

update_weight (2)

empty_ (2)

empty (2)

cat_list (2)

v_s (2)

v (2)

b (2)

Méthodes fréquemment utilisées

to_numpy (3)

items (3)

obs_next (2)

update_weight (2)

empty_ (2)

empty (2)

cat_list (2)

v_s (2)

v (2)

b (2)

values (1)

value_targets (1)

advantages (1)

loss (1)

policy (1)

stack_ (1)

__repr__ (1)

info (1)

indice (1)

Méthodes fréquemment utilisées

values (1)

value_targets (1)

advantages (1)

loss (1)

policy (1)

stack_ (1)

__repr__ (1)

info (1)

indice (1)

Exemple #1

0

Afficher le fichier

Fichier : marvil_policy.py Projet : hebowei2000/marvil_tianshou

def compute_advantage(self, batch:Batch, last_r: float, gamma: float = 0.9, lamda: float = 1.0, use_gae: bool = True, use_critic: bool = True): """ Given a rollout, compute its value targets and the advantage Args: batch (Batch): batch of a single trajectory last_r (float): value estimation for the last observation gamma (float): Discount factor lambda (float): parameter for GAE use_gae (bool): using Generalized Advantage Estimation use_critic (bool): whether to use critic (value estimation), setting this to false will use 0 as baseline Returns: batch (Batch): object with experience from batch and processed rewards """ assert batch.vf_preds in batch or not use_critic assert use_critic or not use_gae if use_gae: vpred_t = np.concatenate([batch.vf_preds, np.array([last_r])]) delta_t = (batch.rew + gamma * vpred_t[1:] - vpred_t[:-1]) # This formula for the advantage comes from "Generalized Advantage Estimation": https://arxiv.org/abs/1506.02438 batch.advantages = self.discount_cumsum(delta_t, gamma * lamda) batch.value_targets = (batch.advantages + batch.vf_preds).astype(np.float32) else: rewards_plus_v = np.concatenate([batch.rew, np.array([last_r])]) discounted_returns = discount_cumsum(rewards_plus_v, gamma)[:-1].astype(np.float32) if use_critic: batch.advantages = discounted_returns - batch.vf_preds batch.value_targets = discounted_return else: batch.advantages = discounted_returns batch.value_targets = np.zeros_like(batch.advantages) batch.advantages = batch.advantages.astype(np.float32) return batch