How to code Neural Style Transfer in Python

In this blog we have talked a lot about neural networks: we have learned how to code one from scratch, use them to classify images and, even, use them to create new images. Today we will learn another fascinating use of neural networks: applying the styles of an image into another image. Today we will learn how to code a Neural Style Transfer network in Python. Let’s get to it!

Preparing the environment

Preparing Google Colab

In my case I am programming this post in Google Colab so that I can train the neural network on GPUs for free.

However, since Google Colab disconnects you from time to time, I am going to sync my Colab account with Google Drive. By doing so, I will save the results that I get without having to start over again. In this case, it may not be as extreme as in the GAN … but it never hurts to do it. Also, this is something that I already explained in this post, so I’m not going to dwell on it too much.

In any case, we first activate the use of GPU.

import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
Found GPU at: /device:GPU:0

Now that we are working on the GPU, we are going to connect Google Colab with our Google Drive.

from google.colab import drive
import os
drive.mount('/content/gdrive/')
Drive already mounted at /content/gdrive/; to attempt to forcibly remount, call drive.mount("/content/gdrive/", force_remount=True).

Finally, we access the Drive folder where I save the information related to this post.

%cd /content/gdrive/My\ Drive/Posts/Neural \Style \Transfer
/content/gdrive/My Drive/Posts/Neural Style Transfer

Now that we have Google Colab and Drive synchronized, we are going to upload the libraries and data.

Loading libraries and data

As always, the first thing we have to do is load the packages and the data that we are going to use. Regarding packages, we will use Keras and Tensorflow for neural networks and Numpy for data manipulation.

Likewise, it should be noted that I have not invented this implementation from scratch. This post is based on the implementation offered by Keras (with a much more in-depth explanation and some code changes).

import numpy as np
from keras.utils import get_file
import matplotlib.pyplot as plt

style_transfer_url = "https://i.imgur.com/9ooB60I.jpg"
base_url = "https://tourism.euskadi.eus/contenidos/d_destinos_turisticos/0000004981_d2_rec_turismo/en_4981/images/CT_cabecerabilbaoguggen.jpg"

style_image_path = get_file(fname = "skyscraper.jpg", origin = style_transfer_url)
base_image_path = get_file(fname = "bilbao.jpg", origin = base_url)
Downloading data from https://i.imgur.com/9ooB60I.jpg
942080/935806 [==============================] - 0s 0us/step
Downloading data from https://tourism.euskadi.eus/contenidos/d_destinos_turisticos/0000004981_d2_rec_turismo/en_4981/images/CT_cabecerabilbaoguggen.jpg
204800/201498 [==============================] - 0s 2us/step

Finally, we are going to visualize the images that we have downloaded and that we are going to use for the Neural Style Transfer.

import matplotlib.image as mpimg
import matplotlib.pyplot as plt

# read the image file in a numpy array
a = plt.imread(base_image_path)
b = plt.imread(style_image_path)
f, axarr = plt.subplots(1,2, figsize=(15,15))
axarr[0].imshow(a)
axarr[1].imshow(b)
plt.show()
Base and content images to be used on Neural Style Transfer

Now that we have the data loaded, let’s code our Neural Style Transfer in Python!

Understanding how a Neural Style Transfer works

If we want to code a Neural Style Transfer, the first thing we must do is to perfectly understand how convolutional neural networks work. In my case, I already talked about them in this post, although today it will be more specific.

In order for a Neural Style Transfer network to work, we must achieve at least two things:

  1. Convey the style as much as possible.
  2. Assure that the resulting image look as close to the original image as possible.

We could consider a third objective: making the resulting image as internally coherent as possible. This is something that Keras’s implementation includes but that, in my case, I am not going to dive into.

To do this, a neural style transfer network has the following:

  1. A convolutional neural network already trained (such as VGG19 or VGG16). Three images will be passed to this network: the base image, the style image, and the combination image. The latter could be both a noise image and the base image, although generally the base image is passed in order to make the resulting image look similar and to speed up the process.
  2. The combined image is optimized and steadily changes in such a way that it takes the styles of the style image while maintaining the content of the base image. To do this, we will optimize the image using two different losses (style and content).
Funcionamiento del algoritmo de Neural Style Transfer

As long as we achieve these two goals, we will have good results. Now, how do we get our network to learn this? Let’s see it!

How to transfer the style of an image

In convolutional neural networks, the deeper we go into the network, the more complex shapes the network distinguishes. This is something that can be clearly seen in the ConvNet Playground application, which allows you to see the layer channels at different “depths” of the network.

Therefore, if we want to transfer the style of an image, we will have to make the values of the features of the deep layers of our network look like those of the network of the style image.

But how can we calculate the loss function of this process in order to adjust it? For this, we use the so-called Gram Matrix.

What is and how to obtain the Gram Matrix

Suppose we want to calculate the style loss on a layer. To do this, the first thing we must do is flatten our layer. This is a good thing, as the Gram Matrix calculation will not change based on the size of the layer.

So, suppose we do a flatten on a layer of 3 filters. Well, the Gram Matrix shows the similarity between the filters and is obtained by calculating the dot product between the vectors:

Gram Matrix Formula

When we are calculating dot products, we are taking the distance into account: the smaller the dot product, the closer the two vectors are and vice versa. In the end it is something similar to what we already did when we coded the recommendation system.

If you want to dive into how the Gram Matrix is calculated, I recommend watching this video. Anyway, in our case, we are going to program it:

def gram_matrix(x):
    x = tf.transpose(x, (2, 0, 1))
    features = tf.reshape(x, (tf.shape(x)[0], -1))
    gram = tf.matmul(features, tf.transpose(features))
    return gram

Now that we have the Gram matrix we can calculate the loss function of the style, which is basically the degree of correlation between the styles within a layer.

Therefore, to calculate the loss function we are going to calculate the Gram matrix of both the image to be transferred and the resulting image and calculate the mean square error. According to the original paper, this is divided by 4 (I have not found the explanation for that).

Anyway, we code it:

def coste_estilo(style, combination):
    S = gram_matrix(style)
    C = gram_matrix(combination)
    channels = 3
    size = img_nrows * img_ncols
    return tf.reduce_sum(tf.square(S - C)) / (4.0 * (channels ** 2) * (size ** 2))

And with this, we have already coded the loss function of the style! Yes, it is complex, but don’t worry, the rest of the questions are simpler.

Now, let’s see how to achieve the second vital point that we have discussed: make the resulting image look as close as possible to the input image.

How to make the resulting image look like the original image

To make the original image and the resulting image look alike, we must, in some way, measure the similarity between the two. Of course, this will be a loss function to use, which in this case we will call the content loss function.

In this sense, the content loss function is much simpler than the style function. Why? Because, according to this study, similar images tend to have similar deep layers.

Therefore, if two images have similar content, then they will have similar deep layers. Taking this into account is how we will code the content loss function.

def coste_contenido(base, combination):
    return tf.reduce_sum(tf.square(combination - base))

With this, we ensure that we meet the second requirement. Therefore, we can already make sure that we are going to get the style to be transposed while maintaining the content.

Now, what structure do we need to generate all this? Let’s see it!

Structure of a Neural Style Transfer network

To code a Neural Style Transfer (in this case in Python), as in a GAN, we will start from a base image. As I have said, this image can be either ‘noise’ or the base image itself (the base image is generally used as it is usually faster).

We will pass this image through a classification convolutional neural network. Generally, already created neural networks are used. In our case, we will use the VGG19 network trained with the ImageNet dataset, which is a neural network offered by Keras.

Does the type of neural network we use influence the results we get? Well, according to this article, it does: networks such as VGG-16 or VGG-19 generate images with oil-style, while the use of inception networks generates more pencil-style images.

In any case, regardless of the neural network we choose, we will go through that network both to the generated image, as well as to the base image and the style image.

Thus, comparing the data in the different layers between the base image and the generated image we will obtain the loss of content, while comparing the layers of the layers of the style image with the generated image we will obtain the loss of style.

With these two losses, we will obtain the total loss that, by optimizing, will improve the results of our image.

Now that we have a good base of how a Neural Style Transfer network works, let’s learn how to code it in Python!

How to code a Neural Style Transfer network in Python

Loading the VGG19 Model

First of all, we are going to load the VGG19 model. As I mentioned, it is a model that Keras already offers, so there is no major complication:

from tensorflow.keras.applications import vgg19
from keras.utils import plot_model

model = vgg19.VGG19(weights="imagenet", include_top=False)

model.summary()
Model: "vgg19"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, None, None, 3)]   0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, None, None, 128)   0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, None, None, 256)   295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv4 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, None, None, 256)   0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, None, None, 512)   1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv4 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, None, None, 512)   0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv4 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, None, None, 512)   0         
=================================================================
Total params: 20,024,384
Trainable params: 20,024,384
Non-trainable params: 0
_________________________________________________________________

Calculation of loss functions

Now that we have the model, we must create a function that extracts the values of that model for some given layers (in this way we can use it for both the content error and the style error).

from keras import Model

outputs_dict= dict([(layer.name, layer.output) for layer in model.layers])

feature_extractor = Model(inputs=model.inputs, outputs=outputs_dict)

Now, we are going to define which layers we are going to use to calculate the loss function of the style and which layer we are going to use to calculate the loss function of the content.

Following the example offered by Keras, to calculate the style loss function we will use the first convolution of each block, while for the content we will use the second convolution of the last block.

Thus, to calculate the loss we will follow the following steps:

  1. Combine all the images in the same tensioner.
  2. Get the values in all the layers for the three images. Yes, it is true that we will not need all the values of all the images, but this will be easier since we will already have everything extracted. In fact, if we wanted to change the extraction of styles, it would also be very simple.
  3. Initialize the loss vector where we will add the results.
  4. Extract the content layers for the base image and merge and calculate the content loss function.
  5. Extract the style layers for the style image and the combination image and calculate the style loss function.

As you can see, it is quite easy, so let’s do it!

capas_estilo = [
    "block1_conv1",
    "block2_conv1",
    "block3_conv1",
    "block4_conv1",
    "block5_conv1",
]

capas_contenido = "block5_conv2"

content_weight = 2.5e-8
style_weight = 1e-6

def loss_function(combination_image, base_image, style_reference_image):

    # 1. Combine all the images in the same tensioner.
    input_tensor = tf.concat(
        [base_image, style_reference_image, combination_image], axis=0
    )

    # 2. Get the values in all the layers for the three images.
    features = feature_extractor(input_tensor)

    #3. Inicializar the loss

    loss = tf.zeros(shape=())

    # 4. Extract the content layers + content loss
    layer_features = features[capas_contenido]
    base_image_features = layer_features[0, :, :, :]
    combination_features = layer_features[2, :, :, :]

    loss = loss + content_weight * coste_contenido(
        base_image_features, combination_features
    )
    # 5. Extraer the style layers + style loss
    for layer_name in capas_estilo:
        layer_features = features[layer_name]
        style_reference_features = layer_features[1, :, :, :]
        combination_features = layer_features[2, :, :, :]
        sl = coste_estilo(style_reference_features, combination_features)
        loss += (style_weight / len(capas_estilo)) * sl

    return loss

Loss functions finished! Now we are going to see how we make the network learn. Let’s go for optimization and gradients!

Learning of the Neural Style Transfer network

Now that we have the cost function, we have to calculate the deltas, which are what gradient descent (or any other optimizer) uses to find our optimal values. To calculate these deltas it is necessary to calculate the derivatives.

We achieve this in Tensorflow with the GradientTape function. So we will create a function that, given a cost, returns the gradients. These gradients will only be calculated with the base image.

import tensorflow as tf

@tf.function
def compute_loss_and_grads(combination_image, base_image, style_reference_image):
    with tf.GradientTape() as tape:
        loss = loss_function(combination_image, base_image, style_reference_image)
    grads = tape.gradient(loss, combination_image)
    return loss, grads

With this we have the learning phase done! Finally, there are only two things left to finish coding our Neural Style Transfer in Python: preparing the images and creating the training loop. Let’s do it!

Preprocess and deprocess images

How to preprocess images

The preprocessing of the images consists of giving the images the format that our network requires. In the case of Keras, as it is the VGG19 model, the model itself has an image preprocessing function: preprocess_input.

Keep in mind that Keras works with image batches. Therefore, the information we pass on must be in this format. To do this, carry out the following processes:

  1. load_image: we upload an image and give it a specific shape.
  2. img_to_array: we convert the loaded image into an array that considers the number of channels. In our case, being color images, we will have three channels, while a black and white image would have only one channel.
  3. expand_dims: we group all the images in a single array, since, as we have said, Keras works with batches of images. Thus, the result of this step will be an array with shape (3, width, height, 3).
  4. preprocess_input: subtract the mean of the RGB values from the Imagenet dataset (with which VGG19 is trained), in such a way that we get the images to have zero average. This is a typical preprocessing in images, as this prevents gradients from being too “extreme”, thus achieving better model results (link).
  5. convert_to_tensor: finally, we are going to convert our already centered array into a data type that Tensorflow understands. For that, we will simply convert it to a tensor with this function.

So let’s create a function that performs precisely the preprocessing we just explained.

import keras
from tensorflow.keras.applications import vgg19
import numpy as np


def preprocess_image(image_path):
    # Util function to open, resize and format pictures into appropriate tensors
    img = keras.preprocessing.image.load_img(
        image_path, target_size=(img_nrows, img_ncols)
    )
    img = keras.preprocessing.image.img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = vgg19.preprocess_input(img)
    return tf.convert_to_tensor(img)

Image pre-processing finished! We have just seen how to convert images (arrays) into a data type that our model understands (tensors).

Now, we are going to see exactly the opposite, how to convert the result of the model (tensor) into an image that we can visualize. Let’s code our image deprocessor!

 Deprocessing images

As I have commented, to deprocess the images we will have to follow an almost reverse process to the one we have used to process the images. For this we will carry out the following steps:

  1. Convert the tensor into an array that we can use.
  2. Make the data not have a zero average. To do this, we must add the average for each of the channels of the Imagenet dataset. Luckily this is not something that has to be calculated, since we can find it here. Also, we make sure that there are no values above 255 or below 0.
  3. Convert images from BGR to RGB. This is due to the fact that historically the use of the BGR format has become popular and that is why packages like OpenCV read images as BGR instead of RGB.

That said, we are going to code the deprocessing of the Neural Style Transfer that we are learning to code in Python!

def deprocess_image(x):

    # Convertimos el tensor en Array
    x = x.reshape((img_nrows, img_ncols, 3))

    # Hacemos que no tengan promedio 0
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68

    # Convertimos de BGR a RGB.
    x = x[:, :, ::-1]

    # Nos aseguramos que están entre 0 y 255
    x = np.clip(x, 0, 255).astype("uint8")

    return x

Image deprocessing finished!

Now we only have one thing left: create the training loop and train our network. Let’s go for it!

Training of our Neural Style Transfer network

Now that we have all the ingredients already created, creating the training loop is quite simple. You simply have to:

  1. Preprocess images and create the combined image,
  2. Iteratively, calculate loss, and apply the gradients to the combined image.
  3. Every few iterations, show the error and save the generated image and the model. In my case, since I am training the network in Google Colab, saving the images is essential, since otherwise we risk being disconnected from the server during the training process and have to start over from scratch.

So, to make the training more clean, I’m going to create a function that does the third point.

Save the generated images

The function that we are going to create is a simple function that generates an image and saves it. Let’s do it:

from datetime import datetime

def result_saver(iteration):
  # Create name
  now = datetime.now()
  now = now.strftime("%Y%m%d_%H%M%S")
  #model_name = str(i) + '_' + str(now)+"_model_" + '.h5'
  image_name = str(i) + '_' + str(now)+"_image" + '.png'

  # Save image
  img = deprocess_image(combination_image.numpy())
  keras.preprocessing.image.save_img(image_name, img)

Now that we have everything prepared, let’s code the training of our Neural Style transfer network made in Python!

from keras.optimizers import SGD

width, height = keras.preprocessing.image.load_img(base_image_path).size
img_nrows = 400
img_ncols = int(width * img_nrows / height)

optimizer = SGD(
    keras.optimizers.schedules.ExponentialDecay(
        initial_learning_rate=100.0, decay_steps=100, decay_rate=0.96
    )
)

base_image = preprocess_image(base_image_path)
style_reference_image = preprocess_image(style_image_path)
combination_image = tf.Variable(preprocess_image(base_image_path))

iterations = 4000

for i in range(1, iterations + 1):
    loss, grads = compute_loss_and_grads(
        combination_image, base_image, style_reference_image
    )
    optimizer.apply_gradients([(grads, combination_image)])
    if i % 10 == 0:
        print("Iteration %d: loss=%.2f" % (i, loss))
        result_saver(i)
Iteration 10: loss=21816.62
Iteration 20: loss=9596.32
Iteration 30: loss=6480.99
Iteration 40: loss=4951.99
Iteration 50: loss=4035.06
...
Iteration 3950: loss=283.40
Iteration 3960: loss=283.22
Iteration 3970: loss=283.05
Iteration 3980: loss=282.88
Iteration 3990: loss=282.71
Iteration 4000: loss=282.54

We have just get the image generated by our Neural Style Transfer network! Let’s visualize it!

Result of codinga Neural Style Transfer algorithm

Conclusion of coding a Neural Style Transfer network

As you can see, coding a Neural Style Transfer neural network in Python is not very complicated (beyond calculating the loss functions). In any case, this is not all, Neural Style Transfer networks do not end here, they offer many more possibilities, from applicating them only to a section of an image using masks to also transfer color (see this repository for inspiration).

In my opinion, the only problem with the Neural Style Transfer algorithms is putting them into production, becasue the model is used ad hoc for each base image. Therefore, the implementation is usually not as simple as in the case of a traditional algorithm.

Besides, I think that from a business perspective this algorithm is not very useful or interesting.

Anyway, I hope this has been interesting, that you have learned to program your own Neural Style Transfer network in Python and that it is useful even for generating gift images.

See you on the next one!

Blog sponsored by: