Publish:

Tags:

Categories:

Autoregressive Models: Implementing PixelCNN for Chinese Calligraphy

An exploration of generative models for creating Chinese calligraphy images based on the PixelCNN architecture

Introduction

In this project, I implemented the PixelCNN model based on the paper “Pixel Recurrent Neural Networks” by Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. The goal was to explore generative models and their application to the Chinese calligraphy dataset, which consists of images of Chinese characters written in various calligraphy styles.

Dataset

The dataset used in this project consists of Chinese calligraphy images, with each image representing a Chinese character in a particular calligraphy style. The dataset is diverse and includes a variety of styles, stroke thicknesses, and character complexities. A preprocessing step was applied to resize and normalize the images before feeding them into the model.

Dataset Visualization

Model Implementation

The PixelCNN model is an autoregressive generative model that learns to generate images pixel by pixel. It uses a masked convolutional architecture to ensure that the generated pixel values are only conditioned on the pixels above and to the left of the current pixel.

The implementation of the model was based on the paper, and the architecture included the following key components:

  1. Downward and rightward moved convolutional layers.
  2. A stack of gated residual blocks with multiple layers for each block.
  3. An Exponential Linear Unit (ELU) activation function followed by a dense output layer.

The model architecture in code form:

1
2
3
4
5
6
7
8
9
10
x = down_move(self.down_moved_conv2d(inputs)) + right_move(self.down_right_moved_conv2d(inputs))

for i in range(self.n_block):
    for j in range(self.n_resnet):
        x = self.ul_list_gated_resnet[i * self.n_resnet + j](x)
    x = self.ul_list_dense_layer[i](x)

x = tf.nn.elu(x)
x = self.out_dense(x)

Model Architecture Visualization

Training

TThe model was trained on the Chinese calligraphy dataset, using a binary cross-entropy loss function to optimize the pixel probabilities. Binary cross-entropy was chosen as the loss function because it is well-suited for binary classification tasks, like predicting black and white pixels in this case. The Adam optimizer was used with the following parameters:

  • Learning rate: 0.0001
  • Beta 1: 0.95
  • Beta 2: 0.9995
  • Epsilon: 1e-6
  • Exponential Moving Average (EMA): Enabled with momentum of 0.9995

The Adam optimizer was chosen due to its adaptive learning rate, which allows for efficient training and faster convergence compared to other optimizers like SGD. The parameters were set to provide a balance between exploration and exploitation in the parameter space, while EMA was used to stabilize training and reduce variance in the weight updates.

The model was compiled with the optimizer, loss function, and accuracy metric. A batch size of 32 was used during training, as it provides a good balance between computational efficiency and memory usage. The model was trained for a total of 10 epochs to ensure sufficient learning while avoiding overfitting.

Training Visualization

Results

After training, the PixelCNN model was able to generate visually plausible Chinese calligraphy images, showcasing the potential of autoregressive models in generative tasks. The generated images captured the diversity of styles and stroke thicknesses present in the dataset.

Generated Images Visualization

Conclusion

This project demonstrates the power of autoregressive models like PixelCNN in generating diverse and visually appealing images, in this case, Chinese calligraphy. The results show the model’s ability to capture the nuances of different calligraphy styles and generate new images based on the learned distribution.

Possible future work includes exploring other generative models, such as PixelRNN and PixelSNAIL, as well as experimenting with conditional PixelCNN models that can generate calligraphy images conditioned on specific styles or characters.

Project Repository

You can find the complete source code, along with detailed documentation and instructions on how to run the project, in my GitHub repository.


Leave a comment