Jointly learning policies and latent representations for driver behavior. See paper here.
The video below illustrates the different driver classes used in training the encoder and policies.
Below we can see the how the encoder chooses to represent trajectories from different driver classes as training progresses.
Once we have a trained policy, we can propagate trajectories by passing observations and samples from the latent space into the policy and using the actions to propagate the scene forward. If we initialize a vehicle at 20 m/s and an aggressive latent state, we can see that it chooses to accelerate.
Instead, if a vehicle is initialized with a passive latent state, it chooses to decelerate.