Image GPT: Generative Pretraining from Pixels (Paper Explained)
Yannic Kilcher
BERT and GPT-2/3 have shown the enormous power of using generative models as pre-training for classification tasks. However, for images, pre-training is usually done with supervised or self-supervised objectives. This paper investigates how far you can get when applying the principles from the world of NLP to the world of images.
OUTLINE: 0:00 - Intro & Overview 2:50 - Generative Models for Pretraining 4:50 - Pretraining for Visual Tasks 7:40 - Model Architecture 15:15 - Linear Probe Experiments 24:15 - Fine-Tuning Experiments 30:25 - Conclusion & Comments
Paper: https://cdn.openai.com/papers/Generative_Pretraining_from_Pixels_V2.pdf Blog: https://openai.com/blog/image-gpt/ Code: https://github.com/openai/image-gpt
Abstract: Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, we find that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, we achieve 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full finetuning, matching the top supervised pre-trained models. An even larger model trained on a mixture of ImageNet and web images is competitive with self-supervised benchmarks on ImageNet, achieving 72.0% top-1 accuracy on a linear probe of our features.
Authors: Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever
Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher ... https://www.youtube.com/watch?v=YBlNQK0Ao6g
146736703 Bytes