Exemplo n.º 1
0
def create_counterstrategy(consmdp,
                           capacity,
                           targets,
                           init_state,
                           energy=None,
                           solver=GoalLeaningES,
                           objective=BUCHI,
                           threshold=0.1):
    """
    Create counter strategy for given parameters and the current consMDP object
    and return the strategy
    """

    if energy is None:
        energy = capacity
    if solver == GoalLeaningES:
        slvr = GoalLeaningES(consmdp, capacity, targets, threshold=threshold)
    elif solver == BasicES:
        slvr = BasicES(consmdp, capacity, targets)
    selector = slvr.get_selector(objective)
    strategy = CounterStrategy(consmdp,
                               selector,
                               capacity,
                               energy,
                               init_state=init_state)
    return strategy
    # for item in targets:
    #     reload_list.append(item)
    print(reload_list)
    env = SynchronousMultiAgentEnv(num_agents=num_agents,
                                   grid_size=[gridsize, gridsize],
                                   capacities=[400 for _ in range(num_agents)],
                                   reloads=reload_list,
                                   targets=targets,
                                   init_states=init_state,
                                   enhanced_actionspace=0)
    env.allocate_targets(final_tarjan_assignment)

    consmdp1 = env.get_consmdp()
    MDP = consmdp1[0]
    #generate targets
    #compute strategies, pulled from Pranay
    for agent in env.agents:
        print(agent)
        solver = GoalLeaningES(MDP,
                               env.capacities[agent],
                               env.targets_alloc[agent],
                               threshold=0.3)
        selector = solver.get_selector(AS_REACH)
        strategy = CounterStrategy(env.consmdp,
                                   selector,
                                   env.capacities[agent],
                                   env.energies[agent],
                                   init_state=env.init_states[agent])
        env.update_strategy(strategy, agent)
    env.animate_simulation(num_steps=400, interval=100)
Exemplo n.º 3
0
showcase_solver(GoalLeaningES, capacity=35)

# The reason why the agent does not pick `EAST` when being in the half on his route is that `EAST` is not among the actions with the lowest value. The higher value is caused by $\mathit{Safe}$-value for `SOUTH`, which is a possible outcome of `EAST`. Intuitively, $\mathit{Safe}$-value of a state is the minimum level of energy needed to survive from this state and in our scenario it loosely translates to distance from reload states. These values of all possible outcomes also influence the `action_value_T`. In the case above, all possible outcomes of the action `NORTH` have lower $\mathit{Safe}$-value than the state `SOUTH` and thus all actions with `SOUTH` as a possible outcome are disregarded by this solver. As a result, `NORTH` is the only action considered and even though the desired outcome (`EAST` in this case) has a low probability, `NORTH` is the winner.

problematic = 187
strategy_at(GoalLeaningES, problematic)

# This is the strategy at the problematic spot. We can see, that with energy between 18 and 27, the agent picks `0`, which is `NORTH`. The agent with capacity 35 can only reach this spot from the top reload state with capacity 26, which is in this interval. With 40, the energy 31 is feasible in this spot and thus we did not see such behavior.

# ### Thresholds

# We can overcome the issue from above by introducing a threshold on desired outcomes of actions it considers. Loosely speaking, when computing `action_value_T`, we disregard possible outcomes that are less likely than the given threshold. As a result, it only considers `NORTH` as the desired outcome of action `NORTH` when computing `action_value_T`.
#
# Ignoring the rare cases can lead to an increase in the minimal energy we need to satisfy the objective. For example, in the problematic case above, `NORTH` will have higher `action_value_T` with the threshold, which means that its previous value is no longer achieved. Therefore, after reaching the fixpoint for the first time (using the threshold-approach), we run another fixpoint that can improve the current values not using the threshold.

threshold_class = lambda mdp, cap, t: GoalLeaningES(mdp, cap, t, threshold=0.1)
showcase_solver(threshold_class, capacity=35)

problematic = 187
strategy_at(threshold_class, problematic, capacity=35)

# This strategy uses the strong actions (`4`—`7`) for energy between 16 and 27. It goes `NORTH=4` only with interval 16—18, and prefers to go to `SOUTH=6` with energy in 19—28, and finally uses weak action to `EAST` with more than 27 units of energy.

# ## Equivalent values

# In this section, we show that the new solvers are actually improving the Basic solver while maintaining the same minimal energy levels needed to fulfill the objectives. These values can be obtained by calling the `get_min_levels(BUCHI)` on the solvers.

from fimdp.objectives import BUCHI

m, t = e.get_consmdp()
basic = BasicES(m, cap=35, targets=t)
Exemplo n.º 4
0
# ### Simple goal-leaning example

# Consider the following example ConsMDP. In state 0, both actions lead to the state 1 with some probability and otherwise stay in 0. And they do this with the same consumption. In other words, they are equally good for reaching the green state if we ignore the transition probabilities.

from fimdp.examples.cons_mdp import goal_leaning
gl, T = goal_leaning()
gl.show(targets=T)

# The basic solver completely ignores the transition probabilities and chooses **any** of the two actions. In fact, it chooses the one that was added as the first at the creation of the ConsMDP. In our case, it chooses the action `top`. The goal-leaning solver chooses `bottom`, which has a bigger probability to move on.

# +
from fimdp.energy_solvers import BasicES, GoalLeaningES
from fimdp.objectives import BUCHI

basic = BasicES(gl, 10, targets=T)
goal = GoalLeaningES(gl, 10, targets=T)
print(f"Selection rule for state 0 given by the basic solver:", basic.get_selector(BUCHI)[0])
print(f"Selection rule for state 0 given by the goal-leaning solver:", goal.get_selector(BUCHI)[0])
# -

# What is the change? When choosing from equally good actions, the goal-leaning solver chooses the most-likely successful action.
#
# #### More technical explenation
# The measure of *goodness* in the sentence above means *low value of `action_value_T`, which is the least amount of energy needed to satisfy the objective by this action*. The `action_value_T` is $\mathit{SPR-Val}$ in the CAV paper. And basically, this value is sufficient to play this action and always survive, and continue towards targets if we are lucky and the outcome of this action is the one desired. In contrast with the Basic solver, `action_value_T` returns not only the action value, but also the probability that the outcome of this action will the one that produced this value. Then from actions with minimal value we choose the one with the highest probability of reaching the desired outcome.

result = goal.get_selector(BUCHI)[0][0].label
expected = 'bottom'
assert result == expected, (
    f"The goal-leaning strategy should prefer the action `{expected}` " +
    f"in state 0. It chooses `{result}` instead."
)