mihirp1998/AlignProp
AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more sample and compute efficient than reinforcement learning methods (PPO) for finetuning Stable Diffusion
Language: Python
#alignment #diffusion_models #reinforcement_learning #stable_diffusion #text_to_image
Stars: 104 Issues: 4 Forks: 1
https://github.com/mihirp1998/AlignProp
AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more sample and compute efficient than reinforcement learning methods (PPO) for finetuning Stable Diffusion
Language: Python
#alignment #diffusion_models #reinforcement_learning #stable_diffusion #text_to_image
Stars: 104 Issues: 4 Forks: 1
https://github.com/mihirp1998/AlignProp
GitHub
GitHub - mihirp1998/AlignProp: AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion…
AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more sample and compute efficient than reinforcement learning methods...
👍2
AgibotTech/agibot_x1_train
The reinforcement learning training code for AgiBot X1.
Language: Python
#open_source #reinforcement_learning #robotics
Stars: 763 Issues: 2 Forks: 235
https://github.com/AgibotTech/agibot_x1_train
The reinforcement learning training code for AgiBot X1.
Language: Python
#open_source #reinforcement_learning #robotics
Stars: 763 Issues: 2 Forks: 235
https://github.com/AgibotTech/agibot_x1_train
GitHub
GitHub - AgibotTech/agibot_x1_train: The reinforcement learning training code for AgiBot X1.
The reinforcement learning training code for AgiBot X1. - AgibotTech/agibot_x1_train
Gen-Verse/ReasonFlux
ReasonFlux-32B beats o1-preview and DeepSeek-V3 with only 500 thought templates
Language: Python
#chain_of_thought #deepseek_r1 #deepseek_v3 #llm_rlhf #o1_mini #o1_preview #reinforcement_learning #sft_data
Stars: 194 Issues: 2 Forks: 10
https://github.com/Gen-Verse/ReasonFlux
ReasonFlux-32B beats o1-preview and DeepSeek-V3 with only 500 thought templates
Language: Python
#chain_of_thought #deepseek_r1 #deepseek_v3 #llm_rlhf #o1_mini #o1_preview #reinforcement_learning #sft_data
Stars: 194 Issues: 2 Forks: 10
https://github.com/Gen-Verse/ReasonFlux
GitHub
GitHub - Gen-Verse/ReasonFlux: ReasonFlux Series - A family of LLM post-training algorithms focusing on data selection, reinforcement…
ReasonFlux Series - A family of LLM post-training algorithms focusing on data selection, reinforcement learning, and inference scaling - Gen-Verse/ReasonFlux
👍1
FareedKhan-dev/all-rl-algorithms
Implementation of all RL algorithms in a simpler way
Language: Jupyter Notebook
#agent #llm #openai #python #reinforcement_learning #rl
Stars: 240 Issues: 0 Forks: 19
https://github.com/FareedKhan-dev/all-rl-algorithms
Implementation of all RL algorithms in a simpler way
Language: Jupyter Notebook
#agent #llm #openai #python #reinforcement_learning #rl
Stars: 240 Issues: 0 Forks: 19
https://github.com/FareedKhan-dev/all-rl-algorithms
GitHub
GitHub - FareedKhan-dev/all-rl-algorithms: Implementation of all RL algorithms in a simpler way
Implementation of all RL algorithms in a simpler way - FareedKhan-dev/all-rl-algorithms
NVlabs/Long-RL
Long-RL: Scaling RL to Long Sequences
Language: Python
#efficient_ai #large_language_models #long_sequence #multi_modality #reinforcement_learning #sequence_parallelism
Stars: 301 Issues: 2 Forks: 3
https://github.com/NVlabs/Long-RL
Long-RL: Scaling RL to Long Sequences
Language: Python
#efficient_ai #large_language_models #long_sequence #multi_modality #reinforcement_learning #sequence_parallelism
Stars: 301 Issues: 2 Forks: 3
https://github.com/NVlabs/Long-RL
GitHub
GitHub - NVlabs/Long-RL: Long-RL: Scaling RL to Long Sequences
Long-RL: Scaling RL to Long Sequences. Contribute to NVlabs/Long-RL development by creating an account on GitHub.