Autonomous drive control methods

Standard control vs. machine learning

Collections of algorithms to solve the autonomous drive control problem in the Udacity car simulator. The goal is to build a controller that is able to safely drive the car around the track, i.e. without going offroad.

We implemented and present here four different approaches, two from classical control theory and two from a machine learning approach.

The simulator can be downloaded here: https://github.com/udacity/self-driving-car-sim/releases

PID (Proportional, Integral, Derivative controller)

The observed variables are cross-track error, current steering angle and current speed.

The controlled variables are steering value and throttle value.

We used a three-level cascaded PID controller for steering, and a single PID for throttle. Integral action has anti-windup limits; we use gain scheduling for the proportional action to recover quickly from extreme cross-track errors; a dead zone is used in throttle control to avoid slowing down too often; all output actions are clamped to the physical limits of the actuators.

PID parameters were optimized for lap speed at the cost of driving smoothness. We reach a maximum of about 65mph on the longest straight and we go down to about 30mph in the turns.

MPC (Model Predictive Controller)

The observed variables are current position and orientation of the car along the track.

The controlled variables are steering value and throttle value.

The target path is given in terms of waypoints along the middle of the track. We use a polynomial fit to generate a continuous function along them.

The optimal actions are obtained by solving a non-linear optimization problem based on the kinematic model of the vehicle.

The cost function to be minimized by the optimizer is built-up adding several parameters: error in orientation (responsible for the highest cost to keep the car mostly parallel to the track); cross-track error (to avoid running off-track); magnitude of steering angle plus first and second derivatives (to avoid large or sudden turning and guarantee a smoother driving experience); difference from maximum speed (to avoid moving too slow or even stopping).

The car behaves much better than the drive with PID control: it follows a smoother trajectory and stays closer to the middle of the track. We also reach much higher speeds: above 95mph on the straight and down to 50mph on the curves.

SL (Supervised Learning)

The controller is a neural network with two convolutional layers at the bottom and two fully connected layers at the top, capped by a softmax layer for classification.

All layers include ReLU non-linear activation functions. Convolutions are all done with (3x3) kernel size, the first layer with 32 filters and the second with 64. Pooling layers are added to downscale by a factor of 2. Each convolutional layer also has a 50% dropout to reduce overfitting.

Input images are scaled to a 320x40 pixels size and flattened to one gray channel. We filter the intensity gradients according to their magnitude and direction in order to highlight the lateral road boundaries.

For simplicity we only use three output classes (turn left/right, go straight).

The learning rate is adapted by an Adam optimizer. We trained for only five epochs to avoid overfitting, given the low size of the training data, and achieved 93% accuracy on the training set and 78% on the validation set.

The only controlled variable is steering value. Speed is kept constant at 15mph.

RL (Reinforcement Learning)

The controller is the same neural network we used for the supervised learning solution, only adapted to output a single continuous action value instead of three discrete values.

We then adopt a policy search strategy, specifically DDPG (deep deterministic policy gradient), where the actor is the controller itself and the critic is a value network that constantly improves the actor’s weights by maximizing the reward.

We created an OpenAI Gym environment to wrap the communication with the Udacity car simulator via a TCP socket and allow keras-rl to train an agent using the simulator. We send in an action (steering angle) and receive an observation (image) and reward (function of the current cross-track error).

The critic network takes both state and action as input (although actions are only merged in the second layer) and outputs a Q-value. Target networks are automatically implemented by keras-rl to improve stability during the learning process.

For the reward we only considered the cross-track error, because speed was fixed, but it would be interesting to include a speed term in the general reward expression and see if the agent can improve lap times. The main problem is training time.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
MPC		MPC
PID		PID
RL		RL
SL		SL
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
screenshot.png		screenshot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPC

MPC

PID

PID

RL

RL

SL

SL

.gitattributes

.gitattributes

.gitignore

.gitignore

README.md

README.md

screenshot.png

screenshot.png

Repository files navigation

Autonomous drive control methods

PID (Proportional, Integral, Derivative controller)

MPC (Model Predictive Controller)

SL (Supervised Learning)

RL (Reinforcement Learning)

About

Releases

Packages

Languages

ffrige/AutoDriveControl

Folders and files

Latest commit

History

Repository files navigation

Autonomous drive control methods

PID (Proportional, Integral, Derivative controller)

MPC (Model Predictive Controller)

SL (Supervised Learning)

RL (Reinforcement Learning)

About

Resources

Stars

Watchers

Forks

Languages