Machine, are you learning?
841 subscribers
33 photos
5 videos
22 files
102 links
Insights in recent Machine Learning topics, approaches, models and papers.
Interested in collaboration, DM @infatum
Download Telegram
https://arxiv.org/pdf/2501.12948

DeepSeek-R1-Zero is a pure RL model without any supervised data and fine-tuning which achieved paramount reasoning capabilities and was actually trained on a DeepSeek-V3-Base model using GRPO(Group Relative Policy Optimisation) approach. Which is truly an amazing result, that shows how undervalued RL potential is. As I foreseen — the next big leap in AI will be achieved by RL massive adoption and incorporation with pre-trained DL models.

Is RL mass-adoption coming?


#DeepSeek #reinforcementlearning #LLM #GRPO #RL
5