https://arxiv.org/pdf/2501.12948
DeepSeek-R1-Zero is a pure RL model without any supervised data and fine-tuning which achieved paramount reasoning capabilities and was actually trained on a DeepSeek-V3-Base model using GRPO(Group Relative Policy Optimisation) approach. Which is truly an amazing result, that shows how undervalued RL potential is. As I foreseen — the next big leap in AI will be achieved by RL massive adoption and incorporation with pre-trained DL models.
Is RL mass-adoption coming?
#DeepSeek #reinforcementlearning #LLM #GRPO #RL
DeepSeek-R1-Zero is a pure RL model without any supervised data and fine-tuning which achieved paramount reasoning capabilities and was actually trained on a DeepSeek-V3-Base model using GRPO(Group Relative Policy Optimisation) approach. Which is truly an amazing result, that shows how undervalued RL potential is. As I foreseen — the next big leap in AI will be achieved by RL massive adoption and incorporation with pre-trained DL models.
Is RL mass-adoption coming?
#DeepSeek #reinforcementlearning #LLM #GRPO #RL
❤5
https://www.youtube.com/watch?v=_CXwZ5xyFno
DeepSeek-R1 Crash course
#deepseekr1 #llm #DeepSeek #course
DeepSeek-R1 Crash course
#deepseekr1 #llm #DeepSeek #course
YouTube
DeepSeek-R1 Crash Course
Learn how to use DeepSeek-R1 in this crash course for beginners. Learn about the innovative reinforcement learning approach that powers DeepSeek-R1, exploring how it achieves performance comparable to industry giants like OpenAI's o1, but at a fraction of…
https://youtu.be/7xTGNNLPyMI?si=nv7EJ0_EmcLFN3Hf
Andrey Karpathy’s tutorial on LLMs
#llm #tutorial #chagpt #ArtificialIntelligence
Andrey Karpathy’s tutorial on LLMs
#llm #tutorial #chagpt #ArtificialIntelligence
YouTube
Deep Dive into LLMs like ChatGPT
This is a general audience deep dive into the Large Language Model (LLM) AI technology that powers ChatGPT and related products. It is covers the full training stack of how the models are developed, along with mental models of how to think about their "psychology"…
👍2👎1