Skip to content

Comparing different methods to implement an autonomous drive controller

Notifications You must be signed in to change notification settings

ffrige/AutoDriveControl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Autonomous drive control methods

Standard control vs. machine learning


Collections of algorithms to solve the autonomous drive control problem in the Udacity car simulator. The goal is to build a controller that is able to safely drive the car around the track, i.e. without going offroad.

We implemented and present here four different approaches, two from classical control theory and two from a machine learning approach.

The simulator can be downloaded here: https://github.com/udacity/self-driving-car-sim/releases

PID (Proportional, Integral, Derivative controller)

The observed variables are cross-track error, current steering angle and current speed.

The controlled variables are steering value and throttle value.

We used a three-level cascaded PID controller for steering, and a single PID for throttle. Integral action has anti-windup limits; we use gain scheduling for the proportional action to recover quickly from extreme cross-track errors; a dead zone is used in throttle control to avoid slowing down too often; all output actions are clamped to the physical limits of the actuators.

PID parameters were optimized for lap speed at the cost of driving smoothness. We reach a maximum of about 65mph on the longest straight and we go down to about 30mph in the turns.

MPC (Model Predictive Controller)

The observed variables are current position and orientation of the car along the track.

The controlled variables are steering value and throttle value.

The target path is given in terms of waypoints along the middle of the track. We use a polynomial fit to generate a continuous function along them.

The optimal actions are obtained by solving a non-linear optimization problem based on the kinematic model of the vehicle.

The cost function to be minimized by the optimizer is built-up adding several parameters: error in orientation (responsible for the highest cost to keep the car mostly parallel to the track); cross-track error (to avoid running off-track); magnitude of steering angle plus first and second derivatives (to avoid large or sudden turning and guarantee a smoother driving experience); difference from maximum speed (to avoid moving too slow or even stopping).

The car behaves much better than the drive with PID control: it follows a smoother trajectory and stays closer to the middle of the track. We also reach much higher speeds: above 95mph on the straight and down to 50mph on the curves.

SL (Supervised Learning)

The controller is a neural network with two convolutional layers at the bottom and two fully connected layers at the top, capped by a softmax layer for classification.

All layers include ReLU non-linear activation functions. Convolutions are all done with (3x3) kernel size, the first layer with 32 filters and the second with 64. Pooling layers are added to downscale by a factor of 2. Each convolutional layer also has a 50% dropout to reduce overfitting.

Input images are scaled to a 320x40 pixels size and flattened to one gray channel. We filter the intensity gradients according to their magnitude and direction in order to highlight the lateral road boundaries.

For simplicity we only use three output classes (turn left/right, go straight).

The learning rate is adapted by an Adam optimizer. We trained for only five epochs to avoid overfitting, given the low size of the training data, and achieved 93% accuracy on the training set and 78% on the validation set.

The only controlled variable is steering value. Speed is kept constant at 15mph.

RL (Reinforcement Learning)

The controller is the same neural network we used for the supervised learning solution, only adapted to output a single continuous action value instead of three discrete values.

We then adopt a policy search strategy, specifically DDPG (deep deterministic policy gradient), where the actor is the controller itself and the critic is a value network that constantly improves the actor’s weights by maximizing the reward.

We created an OpenAI Gym environment to wrap the communication with the Udacity car simulator via a TCP socket and allow keras-rl to train an agent using the simulator. We send in an action (steering angle) and receive an observation (image) and reward (function of the current cross-track error).

The critic network takes both state and action as input (although actions are only merged in the second layer) and outputs a Q-value. Target networks are automatically implemented by keras-rl to improve stability during the learning process.

For the reward we only considered the cross-track error, because speed was fixed, but it would be interesting to include a speed term in the general reward expression and see if the agent can improve lap times. The main problem is training time.

About

Comparing different methods to implement an autonomous drive controller

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published