Final Blog Post

Here are the final results of the model I used for the project in IFT6266, i.e. image generation conditionned on a contour and a caption. The architecture used was inspired from Oord et al., 2016‘s PixelCNN. However, it made the simplifying assumption that all pixels – and all channels – of the 32×32 center image were not […]


Lire la suite de "Final Blog Post"

More news on the latest model

After training on about a hundred epochs, the pixelCNN/autoencoder managed to achieve pretty good results: Unfortunately, it suffers from the same problems as the autoencoder models, i.e. the generations stay blurry. I added a residual block to the network, since I calculated that with my current architecture, at least 14 residual blocks were needed for […]

Lire la suite de "More news on the latest model"

PixelCNN – Updates

After trying the mean squared error as a loss function, no improvements have been observed yet. This may indicate that the model is indeed underfitting the data. I will run the most epochs possible until the end of the project to see if it gets to generate something. The way this autoregressive model learns is also probably […]

Lire la suite de "PixelCNN – Updates"


A first version of Pixel CNN was trained. The model uses the same kind of architecture as in the original article by Oord et al., 2016, i.e. it does not yet uses the captions nor the gated convolutionnel layers. The model is built as follows: 1 convolutionnal layer with a 7×7 kernel and the mask […]

Lire la suite de "PixelCNN"