Genetic Algorithm in Python

Optimization models are one of the great tools we data scientists use to solve problems: from solving optimization problems to finding the optimal hyperparameters of a model.

In this post, we are going to learn how one of the most well-known and used optimization models works: genetic algorithms. More specifically, we are going to see how a genetic algorithm works, we are going to learn how to use it in Python and we are also going to program a genetic algorithm from scratch in Python.

Sound interesting? Let’s go with it!

Introduction to genetic algorithms and optimization models

A genetic algorithm is a type of population-based stochastic optimization algorithm. What does this mean? Basically when we talk about optimization models, these can be classified based on:

  1. How the search is carried out:
    1.  Blind Search: it consists of carrying out an exhaustive search of all possible options, without taking into account what result we have previously had.
    2. Guided Search: consists of using past results to guide the model in the search of the optimum. Likewise, there are two types of guided search models:
      • Single State Search: uses values similar to the first solution for  ;find the result. In cases of little complexity (few minima local), this function converges (finds p.litter rapidly   >
      • Population Based Search: a population of solutions on the that is improving. In in this way,  ;even though it takes longer to converge, it works better than the local guided search models for the most common problems. >
  2. Whether the search is stochastic or deterministic. Stochastic searches are performed stochastically (randomly), so running the process twice on the same data may return different results. However, deterministic models will always return the same result.

In the following table I indicate the different types of optimization algorithms that exist based on the two previously mentioned variables:

DeterministicStocastic
Blind SearchBlind Search Grid SearchMonte Carlo
Single-State SearchTabu Search Hill ClimbingSimulated Annealing
Population Based SearchGenetic Algorithm Genetic Programming Differential Evolution Particle Swarm Estimation of Distribution

As you can see, the genetic algorithm is an algorithm that uses a population-based search and that search is also stochastic. But why does this happen? How does a genetic algorithm actually work? Let’s see!

How does a genetic algorithm work

As the name suggests, genetic algorithms are so called because they simulate the behavior that occurs in nature, just as neural networks are so called because of their similarity to neurons.

If you want to learn how a neural network works, I recommend you this post where I program a neural network step by step.

More specifically, each solution candidate in genetic algorithms is called individual. For its part, the genotype,genome or chromosome denotes the data structure present in an individual. A gene is a position within said data structure and, the specific data, receives the name of alle.

Thus, we start from a series of individuals, each with different alleles (data). Of course, not all individuals will be equally good at optimizing our target (just as not all early humans were equally well adapted to their environment). Therefore, each individual is evaluated with a function, called phenotype or fitness.

Once the fitness of each individual has been calculated, an individual selection process will be carried out (we will go into more detail later in this section). The idea is simple, we will stay with those individuals who have the best fitness.

Once the selection of individuals has been made, the individuals will join each other to obtain offspring, that is, to generate new individuals. This process is known as breeding (breeding) and the generation of new individuals arises through genetic operators, such as crossover and mutation.

For its part, crossover involves the combination of two individuals, while mutation involves making a change on a current individual.

Bearing this in mind, genetic algorithms are based on:

  1. Start a bunch of different individuals, each with different alleles (values). This set of individuals is the population from which we start.
  2. Each of these individuals will be evaluated by calculating their fitness.
  3. A selection of individuals that “survive” will be made, that is, That is, they will eventually have offspring.
  4. The surviving individuals will have offspring and we will return to step 2, iteratively.

So, now that you understand in general terms how a genetic algorithm works, we will go into more detail to see two important questions: how the selection of individuals is carried out and how the generation of new individuals is carried out.

How to make the selection of individuals

There are different ways to select individuals:

  1. Roulette method: it consists in that the probability of breeding is proportional to the fitness, that is, the better your fitness, the more likely it is that you will be chosen. The way to calculate the probability of the individual is given by dividing their fitness by the total sum of fitness achieved.
  2. Ranking: Although the roulette method seems solid, the problem occurs when many individuals have similar probabilities. In these cases, the best individual may not be caught. Thus, the ranking method consists of ordering the individuals based on fitness, in such a way that the N individuals with the best fitness are chosen.
  3. Steady state selection: consists of eliminating a series of individuals with poor fitness and that these are replaced by breeding from individuals with better fitness.
  4. Tournament-based selection: they are created couples randomly and from each couple the one with the best fitness will be selected.
  5. Elitism: it is based on making partial reproductions, that is, that the individuals with the best fitness have a offspring that are exactly like them.

How to get new individuals by crossover

There are three main ways to breed new individuals by crossover:

  1. One-point crossing: consists of dividing the alleles of the parents into two. The child will take one piece of the alleles from one parent and the other piece from the other parent.
  1. Multipoint crossover: follows the same idea as the previous point, only now the alleles are split into 3 or more groups and the offspring will be made up of combinations of said parts. 
  2. Uniform crossing : Each allele will be chosen from one of the parents at random, creating the child as a random selection of the parent’s values.
  3. Crossover the ordered lists: This system of crossover is used when not all chromosomes are valid solutions and basically allows you to add a series of conditions to the crossover.

Now that we know how new individuals are generated in genetic algorithms thanks to crossover, let’s see how new individuals are created by mutation:

How to get new individuals by mutation

When we talk about mutation, there are two main ways to make a mutation in a gene:

  1. Random mutation: consists of mutating, randomly, an allele (a value).
  2. Shrink: consists of adding a random value from a normal distribution (or another type of distribution, such as a uniform distribution), to one of the individual’s values.

Perfect, with this you already know how a genetic algorithm works. Now, let’s see how to use a genetic algorithm in Python.

How to use a genetic algorithm in Python

To use a genetic algorithm in Python we have the library PyGAD, which allows to create genetic algorithms in a simple way. It is a widely used library, because it can be used with Keras and Pytorch, the two main Deep Learning frameworks, and it also supports the use of different types of crossovers, mutations, and selection.

If you find PyGAD useful and interesting, you can make donations to support and help maintain the project. In this link they explain how you can do it.

So the first thing is to install PyGAD, which you can do with the following command:

pip install pygad

Having this, we are going to solve a relatively simple problem: find the weights that must receive different values ​​so that the sum of the multiplication of said weights by the values ​​returns a specific number.

To do this, we will use the GA function from the PyGAD library. We must indicate a series of parameters to this function, such as the number of iterations to do, how many individuals make up the population or the type of mutation or crossover to apply.

In addition, we must also indicate the function to measure fitness. This function must receive as input a solution and a solution index (this second is not usually used) and return a fitness value.

That said, let’s now see how to solve a simple optimization problem with PyGAD:

import pygad
import numpy

inputs = [0.4,1,0,7,8]
desired_output = 32

def fitness_func(solution, solution_idx):
    output = numpy.sum(solution*inputs)
    fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)
    
    return fitness

ga_instance = pygad.GA(num_generations=100,
                       sol_per_pop=10,
                       num_genes=len(inputs),
                       num_parents_mating=2,
                       fitness_func=fitness_func,
                       mutation_type="random",
                       mutation_probability=0.6)

ga_instance.run()

ga_instance.plot_fitness()

As we can see, the fitness obtained by our model, at the beginning, was very low. However, as the generations go by, fitness has been improving, until reaching an optimal solution in generation 90.

Arriving at the optimal solution implies that there is no best possible solution, and that is why the fitness has not changed.

Now, let’s look at more complex uses of the genetic algorithm in Python, such as more complicated optimization problems or training neural networks in Keras. Let’s go with it!

How to solve complex optimization problems with genetic algorithms in Python

How to train neural networks with genetic algorithms in Python

As of version 2.8.0 of PyGAD, the kerasga (Keras Genetic Algorithms) module was introduced, which includes various functions adapted to Keras to train neural networks using Genetic Algorithms .

Data Loading

The first thing we have to do is get some data. To do this, we will use the load_breast_cancer dataset provided by sklearn. Also, we are going to visualize the first observations, in case you are not familiar with the dataset:

from sklearn.datasets import load_breast_cancer

diabetes = load_breast_cancer(as_frame = True)
data = diabetes['data']
target = diabetes['target']

data.head()

Now that we have the data loaded, let’s split it between train and test, using the train_test_split function provided by sklearn.

If you don’t know the Sklearn library in depth, I recommend you read this post where I explain the library in detail.

from sklearn.model_selection import train_test_split

seed = 1234
x_train, x_test, y_train, y_test = train_test_split(data, target, random_state=seed)

Creation of the neural network

With this data, we are going to create a simple neural network using Keras.

In this sense, I think it is important to emphasize that the objective of this example is not to find the best possible prediction, but to teach how to optimize neural networks based on genetic algorithms.

So, I have chosen a fairly simple structure composed of two intermediate layers with 12 and 8 neurons each.

from keras.models import Sequential
from keras.layers import Dense, Input

model = Sequential()
model.add(Input(x_train.shape[1]))
model.add(Dense(8, activation = 'relu'))
model.add(Dense(1, activation = 'sigmoid'))

Neural network optimization with genetic algorithm

Now that we have the model created, that’s when the model execution kicks in using the genetic algorithm as the model optimizer.

To do this, the steps are as follows:

  1. Create an optimizing model based on our optimizing model.
  2. Given the solution proposed by the genetic algorithm, implement it in our neural network
  3. Make the prediction with the model and calculate the fitness of the prediction.

So the first thing is to create a Keras-compatible genetic algorithm. To do this, the pygad library has the KerasGA function.

This function only has two parameters:

  • model: Keras model that we want to optimize.
  • num_solutions: indicates the number of solutions to use in the population. Each of these solutions will have different model parameters.

So, we create the optimizing model.

from pygad import kerasga

keras_ga = kerasga.KerasGA(
    model = model,
    num_solutions = 15
)

Once this is done, we can obtain the weights proposed by our genetic algorithm using the model_weights_as_matrix function of the kerasga module. Once we have the weights, we can apply them to our model using the set_weights method.

Also, we will take advantage of this same function to make the predictions.

Finally, we will calculate the fitness of the predictions. In this sense, it is important to keep one thing in mind, and that is that the predictive capacity of a classification model is measured by entropy.

Entropy measures the “clutter” of our predictions, so the lower the entropy, the better the model performs.

However, our genetic algorithm is based on calculating the fitness of the model. In this case, the higher the fitness, the better the model.

So, we are going to invert the entropy, so that maximizing fitness implies reducing entropy:

from keras.losses import BinaryCrossentropy

def fitness_func(solution, solution_idx ):
  global model

  # Get the weights from the model
  weights = kerasga.model_weights_as_matrix(
      model= model,
      weights_vector= solution
      )
  
  # Set the weights
  model.set_weights(weights = weights)

  # Make the prediction
  prediction = model.predict(x_train)

  # Calculate fitness
  entropy = BinaryCrossentropy()(y_train, prediction)
  fitness = 1/(entropy.numpy() + 0.0001)

  return fitness

With this we could already train our neural network with a genetic algorithm. However, it is usually advisable to add a callback to the learning process, so that we know how well our neural network is learning.

The following function is an example of a typical callback:

fitness_evol = []

def callback_generation(ga_instance):
  global fitness_evol
  fitness_evol.append(ga_instance.best_solution()[1])
  generation = ga_instance.generations_completed
  
  if generation % 1 == 0:
    print(f"Generation {generation}. Fitness: {ga_instance.best_solution()[1]}")

With all of the above, we can now train our neural network using genetic algorithms:

from pygad import GA

num_generations = 5
num_parents_mating = 5
initial_population = keras_ga.population_weights

ga_optimizer = GA(
    num_generations=num_generations, 
    num_parents_mating=num_parents_mating, 
    initial_population=initial_population,
    fitness_func=fitness_func,
    on_generation=callback_generation
    )

ga_optimizer.run()
Generation 1. Fitness: 0.17774870093706102
Generation 2. Fitness: 0.17774870093706102
Generation 3. Fitness: 0.17779069834927677
Generation 4. Fitness: 0.18384278893734354
Generation 5. Fitness: 0.3925076183040464

With this we would have our trained model. So, let’s see how it has behaved:

import seaborn as sns

sns.lineplot(
    x = range(len(fitness_evol)),
    y = fitness_evol
)

Finally, let’s see how our neural network behaves on new data:

from sklearn.metrics import confusion_matrix, auc
import numpy as np

pred = model.predict(x_test)
classes=np.argmax(pred,axis=1)
confusion_matrix(y_test, classes)

array([[50,  5],
       [8,  80]])

As you can see, we have trained our neural network, in a simple way, using genetic algorithms. However, genetic algorithms can be used for much more. Let’s now see how to solve complex optimization problems with genetic algorithms in Python.

Optimization problems with genetic algorithms in Python

To see the potential of genetic algorithms, we are going to solve a classic optimization problem: find the optimal production value of some products to maximize the profit of the company, given a series of constraints.

In this sense, the contrainst applied by the company are the following:

  • At least 10 units of each product must be produced.
  • There is a maximum number of materials to use. You cannot consume more than this limit.
  • Each product generates a different benefit.

In order to solve optimization problems with genetic algorithms in Python, the first thing to do is code our function that calculates the fitness.

It will be in the fitness calculation function where we will have to encode all these constraints. So, first of all, we create such a function:

margin = np.array([2, 2.5])
material_consumption = np.array([2,3]) 
material_max = 500

def fitness_func(solution, solution_idx):

  solution_added = solution + 50
  calculated_margin = np.sum(solution_added*margin)
  material_consumed = np.sum(solution_added*material_consumption)
  
  if material_consumed> material_max:
    return 0
  
  else:
    return calculated_margin

# We check a possible solution
print(f"Fitness for correct scenario: {fitness_func(np.array([0, 0]), '')}") 
print(f"Fitness for incorrect scenario: {fitness_func(np.array([0, 1]), '')}") 
Fitness for correct scenario: 225.0
Fitness for incorrect scenario: 227.5

Now that we have our fitness function coded, let’s run our genetic algorithm:

ga_instance = pygad.GA(num_generations=1000,
                       sol_per_pop=10,
                       num_genes=2,
                       num_parents_mating=2,
                       fitness_func=fitness_func,
                       mutation_type="random",
                       mutation_probability=0.6
                       )

ga_instance.run()

ga_instance.plot_fitness()

Finally, we can check what is the solution proposed by our genetic algorithm:

ga_instance.best_solution()[0] + 50
array([136.82320846,  75.43757807])

Great, in this tutorial we have already seen what a genetic algorithm is and how to use it in Python. However, in this last section we are going to train more in depth. To do this, we are going to program a genetic algorithm from scratch. Sound interesting to you? Let’s go with it!

How to program a genetic algorithm from scratch in Python

If we want to program a genetic algorithm from scratch, we will have to:

  1. Create a population of n individuals.
  2. Code the different ways offspring are generated.

So, we start with the first thing: defining a population. To do this, we are going to include a question that I consider helps a lot when it comes to convergence, which consists of defining a series of limits that each of the key values ​​can receive.

So first of all we create that population.

Schedule the creation of a population of a genetic algorithm

To program our genetic algorithm in Python, we will use a class. Thus, the creation of the population will be done in the initialization of said class.

That said, to make it easier for the algorithm to converge faster, I’m going to create two types of population initialization:

  1. Random initialization of alleles. Random values ​​will simply be created that must be optimized. This type of initialization will be done if the parameter gene_limits is null.
  2. Limited initialization ed alleles. It consists of including limits for each of our alleles, in such a way that the values ​​are initialized randomly within said limits. This type of initialization will be done if the parameter gene_limits is not null.

Having said that, I program the random initialization of the population:

import numpy as np


class GeneticAlgorithm:
  def __init__(self, n_variables, n_genes, random_state = None, gene_limits = None):

    self.n_variables = n_variables
    self.n_genes = n_genes
    self.gene_limits = gene_limits 
    self.random_state = random_state
    
    # Create the population
    if self.gene_limits is not None:
      
      # Check that both vectors have the same length
      assert len(self.gene_limits) == self.n_variables

      limits = [np.random.uniform(limit[0], limit[1], self.n_genes)
        for limit in self.gene_limits] 

    else:
      limits = [np.random.random(self.n_genes)
        for limit in range(self.n_variables)]
      
    # Convert list of limits to list of solutions
    self.population = list(zip(*limits))
        
ga = GeneticAlgorithm(
    n_variables = 2,
    n_genes = 10
)

ga.population
[(0.3248594341550495, 0.7747988337644331),
 (0.9641753689879542, 0.2728028066671938),
 (0.7745274418319132, 0.2833543446404905),
 (0.8258487102100652, 0.9961554551162345),
 (0.034136970844815595, 0.7000431609346188),
 (0.9956365394402079, 0.6007170537947054),
 (0.5622396252894827, 0.6793736336026693),
 (0.16433824493155325, 0.579682515374066),
 (0.15570018999931567, 0.46125127275801037),
 (0.5544543502640932, 0.8094380724657084)]

Great, now that we have a population, we move on to the next step: calculating fitness.

Calculate the fitness of the genetic algorithm

Starting from a population, we have to allow our genetic algorithm to receive a fitness function. In addition, internally we will use this fitness function to calculate the fitness of each of the solutions.

So, we will extend the previous code, including a new parameter fitness_func which receives as input a solution and returns a fitness value.

Also, I will create a method called get_fitness_scores, which returns the fitness score of the current solution. This will make it easier to use the fitness score throughout the algorithm.

That said, let’s extend the above code to include fitness:

import numpy as np
import inspect

class GeneticAlgorithm:
  def __init__(
      self, n_variables, n_genes, mutation_type = 'random', random_state = None, 
      gene_limits = None, fitness_func = None):

    assert mutation_type in ['random']

    self.n_variables = n_variables
    self.n_genes = n_genes
    self.gene_limits = gene_limits 
    self.mutation_type = mutation_type
    self.random_state = random_state
    self.fitness_func = fitness_func
 
    # Create the population
    if self.gene_limits is not None:
      
      # Check that both vectors have the same length
      assert len(self.gene_limits) == self.n_variables

      limits = [np.random.uniform(limit[0], limit[1], self.n_genes)
        for limit in self.gene_limits] 

    else:
      limits = [np.random.random(self.n_genes)
        for limit in range(self.n_variables)]
      
    # Convert list of limits to list of solutions
    self.population = list(zip(*limits))
  
  def get_fitness_scores(self):
    scores = [self.fitness_func(ind) for ind in self.population]
    return np.array(scores)

def fitness_func(solution):
  solution = np.array(solution)
  solution_added = solution + 50
  calculated_margin = np.sum(solution_added*margin)
  material_consumed = np.sum(solution_added*material_consumption)
  
  if material_consumed> material_max:
    return 0
  
  else:
    return calculated_margin

ga = GeneticAlgorithm(
    n_variables = 2,
    n_genes = 10,
    fitness_func= fitness_func
)

fitness_scores = ga.get_fitness_scores()
fitness_scores
array([228.53122328, 227.3974419 , 226.11156266, 227.45136065,
       225.92371642, 226.86141269, 228.93654237, 228.35770696,
       226.48405291, 227.82272666])

Now that we can evaluate each individual in the population, we move on to the next phase: making the selection of individuals.

How to program the selection of individuals of a genetic algorithm

Once we can evaluate the function, we will create the selection system for our best candidates.

In this sense, we are going to create a system that allows two different types of selection processes to be carried out: the ranking method and the tournament method.

For the ranking method we simply have to return the indices of that population with the k best results in terms of fitness. As we can see, a new parameter is added to our algorithm, which refers to the number of individuals chosen to obtain offspring.

Regarding the tournament, we will create as many random pairs as the number of population genes we want and from each pair we will choose the highest value.

Let’s see our selection systems in action:

def ranking_selection(scores, k):
    if k is None:
      raise ValueError('K must not be none if type ranking is selected.')
    
    ind = np.argpartition(scores, k)[-k:]

    return ind

def tournament_selection(scores, k, n_candidates):
  candidates = [np.random.randint(0, len(scores), 3) for i in range(3)]
  tournament_winner = [np.argmax(scores[tournament]) for i, tournament in enumerate(candidates)]
  ind = [tournament[tournament_winner[i]] for i, tournament in enumerate(candidates)]

  return np.array(ind)

def gene_selection(scores, type, k, n_candidates = None):

  if type not in ['ranking','tournament']:
    raise ValueError('Type should be ranking or tournament')
  
  if type == 'ranking':
    ind = ranking_selection(scores, k)
  elif type == 'tournament':
    ind = tournament_selection(scores, k, n_candidates)
  else:
    pass

  return ind


ranking_selection = gene_selection(scores= fitness_scores,
               type = 'ranking',
               k = 3
               )

tournament_selection = gene_selection(scores= fitness_scores,
               type = 'tournament',
               k = 3,
               n_candidates = 11
               )

print(f"""
Ranking selection results: {ranking_selection}
Tournament selection results: {tournament_selection}
""")
Ranking selection results: [7 6 0]
Tournament selection results: [7 0 6]

Now that we have the selection created, we are going to create the generation system for new generations.

How to schedule the generation of offspring

In order to generate new generations, we are going to allow our model to allow the use of both crossover and mutation. Thus, we will have another parameter (transformation_type) that indicates the type of transformation to apply.

Regarding the crossover, we will allow two types of crossovers: the uniform crossover and the point crossover.

In either case, it is important to keep in mind that two new children will be generated from the two parents. Thus, the number of pairs to generate must be half of the total number of individuals in the population:

def crossover(parent1, parent2, crossover_type):

  # Check crossover type
  if crossover_type not in ['uniform', 'one_point']:
    raise ValueError('crossover_type should be one of uniform, one_point or multi_point')
  
  if crossover_type == 'one_point':
    index = np.random.choice(range(len(parent1)))
    children = [parent2[:index] + parent1[index:], parent1[:index] + parent2[index:]]

  elif crossover_type == 'uniform':
    parents = list(zip(*[parent1,parent2]))
    children1 = tuple([np.random.choice(element) for element in parents])
    children2 = tuple([np.random.choice(element) for element in parents])
    children = [children1, children2]
  else:
    pass

  return children

children_uniform = crossover(
    ga.population[0],
    ga.population[1],
    crossover_type = 'uniform'
    )

children_onepoint = crossover(
    ga.population[0],
    ga.population[1],
    crossover_type = 'one_point'
    )

print(f"""
Children Uniform: {children_uniform}
Children One Point: {children_onepoint}
""")
Children Uniform: [(0.8750958900239486, 0.6664701872724336), (0.8750958900239486, 0.7124125982046803)]
Children One Point: [(0.8750958900239486, 0.7124125982046803), (0.36563321798453374, 0.6664701872724336)]

Now that we have the crossover programmed, we are going to program the mutation. As we have seen before, there are two types of mutations:

  1. Bit string: consists of the random modification of one of the alleles.
  2. Shrink: consists of the addition of a value from a normal distribution to one of the alleles.

In this case, the results obtained are different from the crossover, since, although in the crossover we generate two descendants, in the mutation we will only generate one descendant.

Note: The shrink mutation is sometimes allowed to receive a mean and standard deviation to add random data from a normal distribution with that mean and standard deviation. Although it does not implement it, it is an extension that could be done

def mutation(individual, mutation_type):

  if mutation_type not in ['bitstring', 'shrink']:
    raise ValueError('mutation_type should be one of bitstring or shrinkg')
  
  # Get index of individual to modify
  index = np.random.choice(len(individual))

  # Convert individual to list so that can be modified
  individual_mod = list(individual)

  if mutation_type == 'bitstring':      
    individual_mod[index] = 1 - individual_mod[index]
  
  else:
    individual_mod[index] = individual_mod[index] + np.random.rand()
  
  # Convert indivudal to tuple
  individual = tuple(individual_mod)

  return individual

mutation_bitstring = mutation(ga.population[0], mutation_type = 'bitstring')
mutation_shrink = mutation(ga.population[0], mutation_type = 'shrink')


print(f"""
Mutation bitstring: {mutation_bitstring}
Mutation shrink: {mutation_shrink}
""")

With this we would already have all the ingredients of our genetic algorithm. Finally, it remains to put together all the ingredients to create our genetic algorithm.

Program evolution in genetic algorithms

In order to create the evolution of our genetic algorithm, we simply have to put all the previous points together, which I will do with the optimize method.

In this sense, I have included certain relevant questions, such as the value best_fitness_evolution which keeps track of the best fitness result of each generation.

Likewise, I have also created a method, called view_fitness_evolution, which allows visualizing the evolution of the best fitness obtained, as we have seen previously with PyGAD.

So the final code to generate a genetic algorithm in Python is as follows:

import numpy as np
import inspect
from tqdm import tqdm
import matplotlib.pyplot as plt

class GeneticAlgorithm:
  def __init__(
      self,
      n_variables,
      n_genes,
      n_iterations,
      transformation_type = None,
      mutation_type = None,
      crossover_type = None,
      gene_selection_type = None,
      n_selected_individuals = None,
      n_candidates = None,
      random_state = None, 
      gene_limits = None,
      fitness_func = None
      ):

    # Make validations
    if transformation_type not in ['mutation', 'transformation']:
      raise ValueError('transformation_type should be one of mutation or transformation.')

    self.n_variables = n_variables
    self.n_genes = n_genes
    self.gene_limits = gene_limits 
    self.transformation_type = transformation_type
    self.mutation_type = mutation_type
    self.random_state = random_state
    self.fitness_func = fitness_func
    self.n_iterations = n_iterations
    self.gene_selection_type = gene_selection_type
    self.n_selected_individuals = n_selected_individuals
    self.n_candidates = n_candidates
    self.crossover_type = crossover_type
    self.best_fitness_evolution = []

    # Asserts 
    # Check that fitness function is a function
    assert callable(fitness_func)
 
    # Create the population
    if self.gene_limits is not None:
      
      # Check that both vectors have the same length
      assert len(self.gene_limits) == self.n_variables

      limits = [np.random.uniform(limit[0], limit[1], self.n_genes)
        for limit in self.gene_limits] 

    else:
      limits = [np.random.random(self.n_genes)
        for limit in range(self.n_variables)]
      
    # Convert list of limits to list of solutions
    self.population = np.array(list(zip(*limits)))

  def get_fitness_scores(self):
    scores = [self.fitness_func(ind) for ind in self.population]
    scores = np.array(scores)
    return scores 

  def __append_best_score(self, scores):
    best_score = np.max(scores)
    self.best_fitness_evolution.append(best_score)
    return 'Ok'


  def __ranking_selection(self, k):
      if k is None:
        raise ValueError('K must not be none if type ranking is selected.')
      
      ind = np.argpartition(self.get_fitness_scores(), k)[-k:]
      return ind

  def __tournament_selection(self, k, n_candidates):
    candidates = [np.random.randint(0, len(self.get_fitness_scores()), 3) for i in range(3)]
    tournament_winner = [np.argmax(self.get_fitness_scores()[tournament]) for i, tournament in enumerate(candidates)]
    ind = [tournament[tournament_winner[i]] for i, tournament in enumerate(candidates)]

    return np.array(ind)

  def gene_selection(self, gene_selection_type, k, n_candidates = None):

    if gene_selection_type not in ['ranking','tournament']:
      raise ValueError('Type should be ranking or tournament')
    
    if gene_selection_type == 'ranking':
      ind = self.__ranking_selection(k)
    elif gene_selection_type == 'tournament':
      ind = self.__tournament_selection(k, n_candidates)
    else:
      pass

    return ind
  
  def __crossover(self, parent1, parent2, crossover_type):
    
    # Check crossover type
    if crossover_type not in ['uniform', 'one_point']:
      raise ValueError('crossover_type should be one of uniform, one_point or multi_point')
    
    if crossover_type == 'one_point':
      index = np.random.choice(range(len(parent1)))
      children = parent2[:index] + parent1[index:], parent1[:index] + parent2[index:]

    elif crossover_type == 'uniform':
      parents = list(zip(*[parent1,parent2]))
      children1 = tuple([np.random.choice(element) for element in parents])
      children2 = tuple([np.random.choice(element) for element in parents])
      children = children1, children2
    else:
      pass

    return children


  def __mutation(self, individual, mutation_type):

    if mutation_type not in ['bitstring', 'shrink']:
      raise ValueError('mutation_type should be one of bitstring or shrinkg')
    
    # Get index of individual to modify
    index = np.random.choice(len(individual))

    # Convert individual to list so that can be modified
    individual_mod = list(individual)

    if mutation_type == 'bitstring':      
      individual_mod[index] = 1 - individual_mod[index]
    
    else:
      individual_mod[index] = individual_mod[index] + np.random.rand()
    
    # Convert indivudal to tuple
    individual = tuple(individual_mod)

    return individual

  def optimize(self):
    
    for i in tqdm(range(self.n_iterations)):

      # Calculate fitness score
      scores = self.get_fitness_scores()

      # Append best score
      _ = self.__append_best_score(scores)

      # Make Selection
      selected_genes = self.gene_selection(
          gene_selection_type = self.gene_selection_type,
          k = self.n_selected_individuals,
          n_candidates = self.n_candidates
          )
      
      # Get selected population
      selected_population = self.population[selected_genes.tolist()]
      

      # Make transformations (mutation/crossover)
      if self.transformation_type == 'mutation':
        transformed_genes_index = np.random.choice(
            range(len(selected_population)),
            self.n_genes
            )
        

        new_population = [self.__mutation(selected_population[i], self.mutation_type)
          for i in transformed_genes_index]
      
      else:
        # Get pairs of parents
        parents_pairs = np.random.choice(
            selected_genes,
            (int(self.n_genes/2),2)
            )
        
        # For each pair, make crossover
        new_population = [self.__crossover(self.population[parents[0]],
                                           self.population[parents[1]],
                                           crossover_type=self.crossover_type
                                           )
          for parents in parents_pairs]
        
        # Unnest population
        new_population = [child for children in new_population for child in children]
      
      # Set new population as population
      self.population = np.array(new_population)
    
    # When n_iterations are finished, fitness score
    scores = self.get_fitness_scores()

    # Append best score
    _ = self.__append_best_score(scores)

    # Get the result where the result is the best
    best_score_ind =np.argpartition(scores, 0)[0]
    
    best_solution = self.population[best_score_ind]

    return (best_solution, self.best_fitness_evolution[-1])

  def view_fitness_evolution(self):
    plt.plot(
        range(len(self.best_fitness_evolution)),
        self.best_fitness_evolution
    )

Finally we are going to test it, both applying mutation and applying crossover to see what results it returns. First of all, we apply mutation:

ga = GeneticAlgorithm(
  n_variables = 2,
  n_genes = 10,
  n_iterations = 50,
  transformation_type = 'mutation', #transformation
  mutation_type = 'shrink',
  #crossover_type = 'uniform',
  gene_selection_type = 'ranking',
  n_selected_individuals = 3,
  #n_candidates = None,
  #random_state = None, 
  #gene_limits = None,
  fitness_func = fitness_func
  )

print(ga.optimize())
ga.view_fitness_evolution()
Genetic Algorithm Mutation

Finally we apply crossover:

ga = GeneticAlgorithm(
  n_variables = 2,
  n_genes = 10,
  n_iterations = 50,
  transformation_type = 'transformation',
  crossover_type = 'uniform',
  gene_selection_type = 'ranking',
  n_selected_individuals = 3,
  fitness_func = fitness_func
  )

print(ga.optimize())
ga.view_fitness_evolution()
100%|██████████| 50/50 [00:00<00:00, 693.67it/s]
(array([0.97573885, 0.93485633]), 229.28861852920954)
Genetic Algorithm Crossover

As we can see in both cases the model works correctly, improving the fitness in each iteration.

Conclusion

Without a doubt, genetic algorithms are widely used optimization models that should be among the tools used by every data scientist.

In this post you have been able to learn what a genetic algorithm is, how it works and how to use it easily in Python, both for optimization models and for hyperparameter optimization.

In addition, you have seen how to program a genetic algorithm from scratch in Python. The idea is not that, when you have to use a genetic algorithm, you write it from scratch, but that it has served to establish the operation of this algorithm.

So, I hope you found this post interesting. If so, I encourage you to subscribe to be aware of all the posts that I upload. In any case, see you in the next one!

Blog sponsored by: