The Illustrated GPT-2 (Visualizing Transformer Language Models)
https://jalammar.github.io/illustrated-gpt2/
#ArtificialIntelligence #NLP #UnsupervisedLearning
🔗 The Illustrated GPT-2 (Visualizing Transformer Language Models)
Discussions: Hacker News (64 points, 3 comments), Reddit r/MachineLearning (219 points, 18 comments) This year, we saw a dazzling application of machine learning. The OpenAI GPT-2 exhibited impressive ability of writing coherent and passionate essays that exceed what we anticipated current language models are able to produce. The GPT-2 wasn’t a particularly novel architecture – it’s architecture is very similar to the decoder-only transformer. The GPT2 was, however, a very large, transformer-based language model trained on a massive dataset. In this post, we’ll look at the architecture that enabled the model to produce its results. We will go into the depths of its self-attention layer. And then we’ll look at applications for the decoder-only transformer beyond language modeling. My goal here is to also supplement my earlier post, The Illustrated Transformer, with more visuals explaining the inner-workings of transformers, and how they’ve evolved since the original paper. My hope is that this visual language will hopefully make it easier to explain later Transformer-based models as their inner-workings continue to evolve.
https://jalammar.github.io/illustrated-gpt2/
#ArtificialIntelligence #NLP #UnsupervisedLearning
🔗 The Illustrated GPT-2 (Visualizing Transformer Language Models)
Discussions: Hacker News (64 points, 3 comments), Reddit r/MachineLearning (219 points, 18 comments) This year, we saw a dazzling application of machine learning. The OpenAI GPT-2 exhibited impressive ability of writing coherent and passionate essays that exceed what we anticipated current language models are able to produce. The GPT-2 wasn’t a particularly novel architecture – it’s architecture is very similar to the decoder-only transformer. The GPT2 was, however, a very large, transformer-based language model trained on a massive dataset. In this post, we’ll look at the architecture that enabled the model to produce its results. We will go into the depths of its self-attention layer. And then we’ll look at applications for the decoder-only transformer beyond language modeling. My goal here is to also supplement my earlier post, The Illustrated Transformer, with more visuals explaining the inner-workings of transformers, and how they’ve evolved since the original paper. My hope is that this visual language will hopefully make it easier to explain later Transformer-based models as their inner-workings continue to evolve.
jalammar.github.io
The Illustrated GPT-2 (Visualizing Transformer Language Models)
Discussions:
Hacker News (64 points, 3 comments), Reddit r/MachineLearning (219 points, 18 comments)
Translations: Simplified Chinese, French, Korean, Russian, Turkish
This year, we saw a dazzling application of machine learning. The OpenAI GPT…
Hacker News (64 points, 3 comments), Reddit r/MachineLearning (219 points, 18 comments)
Translations: Simplified Chinese, French, Korean, Russian, Turkish
This year, we saw a dazzling application of machine learning. The OpenAI GPT…
Data Science / Machine Learning / AI / Big Data (VK)
UnsupervisedR&R: Unsupervised Point Cloud Registration via Differentiable Rendering
Mohamed El Banani, Luya Gao, Justin Johnson: https://arxiv.org/abs/2102.11870
#ArtificialIntelligence #DeepLearning #UnsupervisedLearning
UnsupervisedR&R: Unsupervised Point Cloud Registration via Differentiable Rendering
Mohamed El Banani, Luya Gao, Justin Johnson: https://arxiv.org/abs/2102.11870
#ArtificialIntelligence #DeepLearning #UnsupervisedLearning
Data Science / Machine Learning / AI / Big Data (VK)
Labels4Free: Unsupervised Segmentation using StyleGAN
Abdal et al.: https://arxiv.org/abs/2103.14968
#GenerativeModel #DeepLearning #UnsupervisedLearning
Labels4Free: Unsupervised Segmentation using StyleGAN
Abdal et al.: https://arxiv.org/abs/2103.14968
#GenerativeModel #DeepLearning #UnsupervisedLearning
Data Science / Machine Learning / AI / Big Data (VK)
GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds
Hao et al.: https://arxiv.org/abs/2104.07659
#DeepLearning #GenerativeAdversarialNetworks #UnsupervisedLearning
GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds
Hao et al.: https://arxiv.org/abs/2104.07659
#DeepLearning #GenerativeAdversarialNetworks #UnsupervisedLearning