Conditional Image Generation – Plan/Summary

This post details broadly my plan with regards to the deep learning class project for IFT6266. It will be updated as the project evolves.

For a description of the IFT6266 class project, refer to the description on the course blog here.

Goal of the project in broader terms

  • Generate center of 64×64 images of sizes 32×32
  • Will be provided only contour of center and captions
  • Performance of the model will be subjective
  • Unless can determine a great quantitative way of evaluating


As time goes on, the information below will be updated, and links to posts as they become live made available.

Implementation steps

  1. Start by only making the generative CNN without any caption (link to post)
    • CNN will ‘encode’ in some way the contour
    • at which point a decoder/generator will take that and generate
  2. Train DCGAN on dataset (link to post)
  3. Reconstruct images using pre-trained GAN (link to post)
  4. Add the captions to model as additional input (link to Part 1, link to Part 2)
    • Add to generator only
    • Use pre-trained embedding model

List of possible improvements to do going forward

  • Preprocess the data to get faster access rather than loading a batch each time (link to post).
  • Load multiple batches on the GPU to limit bottleneck for copying the data on it each mini batch.









Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s