Python Batch.indice Exemples

Langage de programmation: Python

Espace de nommage/Pack: tianshou.data

Class/Type: Batch

Méthode/Fonction: indice

Exemples au hotexamples.com: 1

Python Batch.indice - 1 exemples trouvés. Ce sont les exemples réels les mieux notés de tianshou.data.Batch.indice extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

Batch(30)

split(30)

weight(28)

pop(23)

returns(17)

stack(14)

update(11)

cat(9)

rew(9)

obs(8)

get(7)

act(7)

to_torch(6)

logp_old(6)

done(6)

cat_(6)

append(5)

adv(5)

is_empty(5)

keys(3)

to_numpy(3)

items(3)

obs_next(2)

update_weight(2)

empty_(2)

empty(2)

cat_list(2)

v_s(2)

v(2)

b(2)

values(1)

value_targets(1)

advantages(1)

loss(1)

policy(1)

stack_(1)

__repr__(1)

info(1)

indice(1)

Méthodes fréquemment utilisées

Batch (30)

split (30)

weight (28)

pop (23)

returns (17)

stack (14)

update (11)

cat (9)

rew (9)

obs (8)

Méthodes fréquemment utilisées

get (7)

act (7)

to_torch (6)

logp_old (6)

done (6)

cat_ (6)

append (5)

adv (5)

is_empty (5)

keys (3)

to_numpy (3)

items (3)

obs_next (2)

update_weight (2)

empty_ (2)

empty (2)

cat_list (2)

v_s (2)

v (2)

b (2)

Méthodes fréquemment utilisées

to_numpy (3)

items (3)

obs_next (2)

update_weight (2)

empty_ (2)

empty (2)

cat_list (2)

v_s (2)

v (2)

b (2)

values (1)

value_targets (1)

advantages (1)

loss (1)

policy (1)

stack_ (1)

__repr__ (1)

info (1)

indice (1)

Méthodes fréquemment utilisées

values (1)

value_targets (1)

advantages (1)

loss (1)

policy (1)

stack_ (1)

__repr__ (1)

info (1)

indice (1)

Exemple #1

0

Afficher le fichier

Fichier : dqn.py Projet : zhangtjtongxue/tianshou

def process_fn(self, batch: Batch, buffer: ReplayBuffer, indice: np.ndarray) -> Batch: r"""Compute the n-step return for Q-learning targets: .. math:: G_t = \sum_{i = t}^{t + n - 1} \gamma^{i - t}(1 - d_i)r_i + \gamma^n (1 - d_{t + n}) \max_a Q_{old}(s_{t + n}, \arg\max_a (Q_{new}(s_{t + n}, a))) , where :math:`\gamma` is the discount factor, :math:`\gamma \in [0, 1]`, :math:`d_t` is the done flag of step :math:`t`. If there is no target network, the :math:`Q_{old}` is equal to :math:`Q_{new}`. """ batch = self.compute_nstep_return(batch, buffer, indice, self._target_q, self._gamma, self._n_step) if isinstance(buffer, PrioritizedReplayBuffer): batch.update_weight = buffer.update_weight batch.indice = indice return batch