Nexyad-ebook-tutorial-genetic-algorithms

Genetic algorithms : tutorial

Keywords : tutorial, genetic algorithms, genetic algorithms, genetic classifiers, Charles Darwin, the origin of species, theory of evolution, William Bateson, genotype, phenotype, mutation, crossing over, reproduction, fitness

Written by Gerard YAHIAOUI and Pierre DA SILVA DIAS, main founders of the applied maths research company NEXYAD, © NEXYAD, all rights reserved : for any question, please CONTACT
Reproduction of partial or complete content of this page is authorized ONLY if "source : NEXYAD http://www.nexyad.com" is clearly mentionned.
This tutorial was written for students or engineers that wish to understand main hypothesis and ideas of Genetic Algorithms..

History and vocabulary

Human beings use to build complex structured things such as houses, towers, cars, TVs, telephones, computers, or peanuts automatic distributors.
Structure of these things come from the work of "engineers" (in a general sense).
In the natural world, some people think that it is the same. A chief engineer would have created and built every structured things.
Physicians use the second principle of thermodynamics which implies that closed systems spontaneously evolve from structured state to desorder. And this principle would let think that structured things cannot appear spontaneously.This principle can be experimented in a laboratory. However, observation of "life" and "Universe" evolutions on a very large scale of space and time seem to show another way.
Indeed, observation of fossiles, for instance, shows that organisms seem to have evolved among time : life would have begun on Earth with very "simple" species (mono cellular organisms), and became more and more complex and structured with time (multi cellular organisms, plants, animals, human beings).
Astro-physicians seem to describe the same kind of observations about the Universe itself : according to the theories in force, at the "begining", there was just unstructured energy localized in a small volume (it means that space already existed !?) that evolved spontaneously into galaxies, solar systems, planets, ... (after a "Big Bang").
Maybe the concept of "closed" systems doesn't apply to "life" or "Universe", or maybe other principles may also apply and would prevail in these cases (critic mass effect, self organisation principle, ...). Or perhaps our perception of what is structured or unstructured is not good.
Concerning the evolution of life, "The theory of evolution" was proposed by Charles Darwin (The Origin of Species) in 1859. Darwin proposed a model where evolution would be the result of randomly drawn modifications that are filtered by the environment : individuals that are more viable than the others have more time to reproduce themselves, whereas individuals that are less viable mostly die before reproduction. After a while, only viable individuals exist. Such a model would lead to more and more adapted species. Ifever structure level is an item of adaptation, then this model would lead "spontaneously" (without engineer) to more and more structured solutions.
Although this model doesn't fit into every observation, it seems to "explain" (see MODELS in order to know what "'explain" means for a model) a significant number of species evolution facts.
Biologists wished then to study reproduction and especially inheritance and variation. The word "genetics" was first applied to describe the study of inheritance and the science of variation by English scientist William Bateson in a letter to Adam Sedgewick, dated April 18, 1905.
From this start, new models of inheritance and variation was developed, according to which the macroscopic characters of an individual are determined by a finite number of microscopic elements called the "genes". The set of macroscopic characters (size, colour of the eyes, number of legs, ...) is called the "phenotype". The set of microscopic characters (the genes) is called the genotype.
The searcher John H Holland, on the basis of both biological evolution theory and genetic research, invented the Genetic Algorithms in the 1960s, in order to build computer programs that would adapt themselves to perform a given function.

What is a problem ? What is a solution ?

First, let us assume that a solution to a problem can be described by a finite number of parameters. Every parameter is coded into a number (for instance, a binary number : 0 or 1, or a real number, ...).
Every list of numbers (of the right type and dimension) is a possible solution.
The problem can be considered as being the criterium of evaluation of a possible solution.
Example : for the second degree equation 2.x² + x - 1 = 0 is a problem (finding x that leads to a proper evaluation of 2.x² + x - 1).
NB : let us notice that this way of describing a problem is different from the "usual" presentation.
One way for solving this problem could be to "test every solution" : of course, for real numbers, the set is infinite and not countable, but the idea remains the same : some strategies may be used in order to find iteratively a solution (example : dichotomy) :

The above problem has 2 solutions :
x = -1
or
x = 1/2

Of course, for such a simple problem, this iterative solution research loop is much less efficient than using a formal mathematical method that directly solves the problem. But for real world applications, direct formal methods may be :
- not existing
- too long to compute
- depending on values (example : "fifty ways to invert a matrix")
- hard to apply in the general case of "noise" (example : a simple Kramer system that should have a determinant equal to zero becomes a system with a determinant about equal to zero => it may be positive or negative : and lead to a solution whereas this solution shouldn't exist ...)

Iterative methods are made to find an approximation of a solution in a given time : for the above problem, an acceptable solution leads to an evaluated value "close to" 0. Then -1,0002 may be an acceptable solution.
Beware, such an iterative method may give a "solution" even if there is no solution : the minimum value for the error (see figure above) is the "best" possible solution.
For instance, the equation x2 + 20 = 0 has no solution in the set of real number (whereas it accepts solutions in the set of complex numbers) :

A loop of iterative approximation of the solution should converge to x=0, with a residual value (an error) y=20 (that will be found as the "best" possible approximation of 0).
So one can see that the interpretation of solutions given by iterative systems that do not prove the existance of the solution (and even that often do not prove their ability to convergence ...) is not that simple.

Genetic algorithms

Genetic algorithms are iterative loops of optimisation : a fitness function measures the adaptation of a solution to the problem needs. Every solution is represented by a set of numbers that we will call a set of "parameters". These parameters are called "genes".

There exist a "function" that uses the parameters. The "function" is called the phenotype, and the problem consist in finding phenotypes adapted to the problem.

Phenotypes are filtered by the fitness function that reproduces the most adapted phenotypes. The more the adpatation the more the reproduction. Phenotypes are modified through 2 ways :

- mutation : it consists in moving one parameter through a random modification,

- crossing over :

The complete cycle is then :

Special characteristics of GEs

The main characteristics of genetic algorithms are :
- several "solutions" (several genotypes, and then several phenotypes) are existing together,
- several solutions are mixed together using the crossing over procedure : it leads to new genotypes made with parts of only viable solutions (the "research" is focused on viable solutions). It is a way to make fast f of existing solutions with a probability of good fitness that is generally much higher than it would be for a completely random modification (with the same range). NB : this "high probability" of good fitness through crossing over is only true if there exist a "topology" in the set of genes : it means that very close genotypes must lead to very close phenotypes. If (when) this property is not true, then crossing over can be considered as a fast random variation (and it often leads to a solution that is eliminated by the fitness function.
- the reproduction may lead to a big number of identical genotypes (because their phenotype is well adapted to the problem). Two identical genotypes will stay unchanged through any crossing over, and then will only change through mutations (slowly),
- it is possible to use "wild cards" in genotypes : a wild card is a gene that has no value : it represents implicitely all the possible values for this gene. For the fitness computation, the "wild carded" genotype is then replaced with all the solutions that it represents, and the reproduction is made with the max of the fitness function. The "wild carded" genotype is a "familly" of genotypes.

NB : There is no proof of genetic algorithms convergence.

In fact :
Genetic algorithms were not made to converge at all !. They are an infinite loop that screens a very big combination of possibilities, and localy optimize a SET of already quite viable solutions in order to build a set of better solutions. They never stop to evolve.

Genetic algorithms scientific papers often speak about "robustness". Robustness doesn't have the same meaning in several applied maths fields. In the case of GEs, it is often the term to say "ability to work for a slightly different problem" : indeed, in real life, the biotop is always changing (usually very slowly, sometimes fast), and the problem that has to be solved by natural animals is to stay alive. But in practice, because the world is always changing, the problem is changing too.
For this reason, GEs users often apply heuristics that avoid to converge (yes you've read correctly !), and convergence to a very good solution is considered as having a very adapted species. The most a species is adapted to a given problem, the less the probability that it would be adapted to another one ... Observation of Nature seems to show the example of "super species" that were very well adapted to their environment. All individuals tend to be very similar, and it becomes hard to move from this solution using crossing over for instance. Such a species is very fragile in case of modifications of the environment. This super species birth is considered by scientists as being the reason of many species death.
So, in fact, the aim of genetic algorithms is "only" to find a set of acceptable solutions, that is a set of individuals that are significantly different and enough viable. The aim is not to find neither THE solution nor A solution.

Using genetic algorithms in order to solve a known problem with a known unique solution is not a real good benchmark : genetic algorithms may be better or worse, depending on the chararcteristics of the formal known solution.
But genetic algorithms are interesting to apply in the case of not solvable problems : there is no unique solution, there is no solution in a formal way (error = zero doesn't mean anything on many real world projects where noise is very important), and all what is expected is several "viable" solutions. And they are very interesting in the case of a problem that may change among time.
Example :
- automatic programming : build automatically a C++ program that should produce a "known" result on a given set of inputs,
- genetic classifiers : expert systems used in "artificial life" researches,
- complex local optimisation problems,

Genetic classifiers

Genetic classifiers are often used in "artificial life" projects : the aim of these projects is to build an artificial environment and artificial animals, with an initial behaviour that is randomly initialized, and see how species with quite understandable behaviours may survive/appear.
A genetic classifier is a set of input/ output rules that are modified by a genetic algorithm :
It is just an application of genetic algorithms where phenotypes are rule-based systems and where genotypes are rules (rules are the parameters of the rule-based system).

Every individual is a genetic classifier that lets its rules evolve.
Individuals that do not have a behaviour that lets them find food and that lets escape from predators disappear fast.

NEXYAD team studied genetic classifiers on artificial life projects (with no application at all ! yes we do this sometimes !) : cats, mice, and cheese world.

It is very interesting to notice that "silly mice" could always find cheese, even with a random behaviour. Cats with a random behaviour always died (they never caught a mouse) and we had to create millions of initial random cats :-)
So rules of cats begin to evolve first.
When cats begin to run after mice (that always move randomly) and catch them, then mice behaviour rules begin to evolve : the 2 species doesn't evolve simulaneously.

NB : creating "cats" and "mice" we didn't initialize the problem completely randomly (we decided that cats should eat mice and that mice should eat cheese ...) : see our paper (in french) : "le jeu du chat et de la souris" : scientificbackground.html