Petr Fejfar, pfejfar _at_ gmail.com
This task is lab assignment 3 for Learning systems course at Mälardalen University Sweden - Erasmus+ program 2016.
We are solving classical travelling salesman problem on given test data set by genetic algorithm.
We need find path between the cities to visit each city only once. We are trying to find as short path as possible.
By task definition we need to answer:
- Explain the important operations of the employed algorithm (e.g. GA) to solve this problem
- Answer is in section Approach
- Explain the representation of the individual solutions in your algorithm.
- Answer is in section Population
- Give the equation of the fitness function used by your algorithm.
- Answer is in section Fitness
- Give the parameters used in your algorithm. Examples: population size, crossover rate…
- Answer is in section Genetic algorithm parameters
- Illustrate how performance of the population evolves with generations (preferably with a figure)
- Answer is in section Results
- Show the best result obtained by your algorithm (the order of locations to visit and the total distance of this route).
- Answer is in section Results
Data set consist of 52 cities with name and position.
First column | Second column | Third column |
---|---|---|
City name | x position | y position |
Example:
1 565.0 575.0
2 25.0 185.0
3 345.0 750.0
4 945.0 685.0
...
Travelling salesman problem is NP-complete problem. We use as approximation algorithm from evolutionary family.
We use stationary model with tournament linear order strategy.
In stationary model we are using stable set of population. Every population we alter only subset of this population. Best members of population survives to next generation.
In linear order strategy we choose the best individuals to evolve new population.
Data set is reprenting G = (V, E), where V are cities and E are path between them. We set distance function as Euclidean distance d(e) = d(a, b) = sqrt( (ax-bx)2 + (ay-by)2 ), where e∈E. and (a, b) = e.
By member M⊂E we mean Hamiltonian path on graph G and population P is set of members M⊂P. Members are easily represented as permutation of V.
We set fitness of member M as function f(M) = Σe⊂M f(e), therefore as path length.
Algorithm cycle we set as below:
function TSP( T, N, M, C, Cmin, Cmax ):
- Generate initial population.
- Sort population by fitness function.
- Remove last C population members.
- Generate new members by crossover of first (N-C) members.
- Mutate M new members created in previous step.
- Sort population by fitness function.
- If time of simulation is greater than T return fitness of first member, go to step 2 otherwise.
Where
- T is maximal time of simulation
- N is population size
- M is mutation rate
- C is crossover rate
- Cmin is minimal crossover size
- Cmax is maximal crossover size
Initial population members are generated as random Hamiltonian paths on graph G. This is done easily as generting random permutation of V.
Mutation is done by swapping two elements in permutation.
Crossover is done by taking 2 parent members and generation offspring from them. Generation is done by taking random fixed sizes sequence in parent A and copying to offspring. Then we takes rest of permutation from parent B and fill offspring in same order as it was in parent A. By this way we create valid permutation.
Example (choose sequence of length 3 on index 2):
Member/permutation | p0 | p1 | p2 | p3 | p4 | p5 | p6 |
---|---|---|---|---|---|---|---|
Parent A | 7 | 3 | 5 | 2 | 1 | 6 | 4 |
Parent B | 1 | 6 | 4 | 2 | 3 | 5 | 7 |
Offspring | 6 | 4 | 5 | 2 | 1 | 3 | 7 |
Choose of right parameters has huge impact on simulation performance.
N is population size and by increasing N we slower the generation, but if we use small population size we can not transfer information about good member to next generation.
M is mutation rate. By increasing it we increase chance to tweak result in longer runs. In early stages of simulation it has lower impact on improving fitness than crossover. This is O(1) operation.
C is crossover rate. By increasing it we speed up approach to best fitness in early stage of simulation. In later phase of simulation there is no huge impact. This is because in early stage is bigger chance to find big sequence, from parent A which update result. In later phases only smaller sequence will improve result - this is basically mutation. Crossover is O(N) operation so is slowest part in generation next population. It has impact on speed of generation.
Cmin is minimal crossover size and Cmax is maximal crossover size. We need to find proper value. Too small interval has same properties as mutation and to big interval has same properties as crossover. So there is trade off in this parameter.
We choose for tuning parameters use genetic algorithm. We use different design than TSP( ... ). Difference we use only generational model. Generational model generate new initial population every population. This model was chosen because of lack of time on this task. In future we want to tweak this model by adding mutation and crossover.
Members are parameters for TSP( ... ). Fitness in this case is best fitness of TSP( ... ) in time T. We chose T to 300sec. This is dependent of computer computational force. As reference machine we use Intel Core i7-4900MQ CPU @ 2.80 GHz, 32GB RAM, SSD disc.
By manual tuning of parameters we were able to get best result about 9k fitness on test data.
By using genetic algorithm to determine algorithm parameters we were able to get best result as 8007,56.
We use for that result generated parameters:
- Population size of 2569
- Mutation count of 289
- Crossover count of 263
- Minimal crossover size of 0
- Maximal crossover size of 10
Histogram of best parameters algorithm is below.
This task was limited by time to work on it. In future we want to tweak parameters generation and let simulation work for longer time. We actually had best result around 7.7k, but we have unfortunately have no record of that.