This post details broadly my plan with regards to the deep learning class project for IFT6266. It will be updated as the project evolves.
For a description of the IFT6266 class project, refer to the description on the course blog here.
Goal of the project in broader terms
- Generate center of 64×64 images of sizes 32×32
- Will be provided only contour of center and captions
- Performance of the model will be subjective
- Unless can determine a great quantitative way of evaluating
As time goes on, the information below will be updated, and links to posts as they become live made available.
- Start by only making the generative CNN without any caption (link to post)
- CNN will ‘encode’ in some way the contour
- at which point a decoder/generator will take that and generate
- Train DCGAN on dataset (link to post)
- Reconstruct images using pre-trained GAN (link to post)
- As suggested in (Yeh et al., 2016)
- Add the captions to model as additional input (link to Part 1, link to Part 2)
- Add to generator only
- Use pre-trained embedding model
List of possible improvements to do going forward
- Preprocess the data to get faster access rather than loading a batch each time (link to post).
- Load multiple batches on the GPU to limit bottleneck for copying the data on it each mini batch.