Reinforcement Learning Global – Telegram

Reinforcement Learning Global

@reinforcement_learning_global

621 subscribers

530 links

This channel is for Reinforcement Learning (RL) Researchers, Professionals, Students & anybody who wants to know more about RL. We will be sharing latest Research papers, Algorithms, Advancements, Applications and related Courses.

Download Telegram

About

Blog

Apps

Platform

Reinforcement Learning Global

621 subscribers

Reinforcement Learning Global

Paper: Dichotomy of Control: Separating What You Can Control from What You Can not by Anonymous

TL;DR: Authors propose dichotomy of control (DoC) for supervised learning in stochastic environments by separating things within a policy's control (actions) from those outside of a policy’s control (env stochasticity) through a mutual information constraint.

Paper: https://openreview.net/pdf?id=DEGjDDV22pI

Supplementary Material: https://openreview.net/attachment?id=DEGjDDV22pI&name=supplementary_material

921 viewsLondon, 04:59

Reinforcement Learning Global

Podcast: John Schulman

John Schulman is a cofounder of OpenAI, and currently a researcher and engineer at OpenAI.

In this podcast he talks about tuning GPT-3 to follow instructions (InstructGPT) and answer long-form questions using the internet (WebGPT), AI alignment, AGI timelines, and more!

Link: https://www.talkrl.com/episodes/john-schulman

TalkRL: The Reinforcement Learning Podcast

TalkRL: The Reinforcement Learning Podcast | John Schulman

John Schulman, OpenAI cofounder and researcher, inventor of PPO/TRPO talks RL from human feedback, tuning GPT-3 to follow instructions (InstructGPT) and answer long-form questions using the interne...

1.1K viewsLondon, 05:06

Reinforcement Learning Global

Podcast: Sven Mika

Sven Mika is the Reinforcement Learning Team Lead at Anyscale, and lead committer of RLlib. He holds a PhD in biomathematics, bioinformatics, and computational biology from Witten/Herdecke University.

He talks about RLlib present and future, Ray and Ray Summit 2022, applied RL in Games / Finance / RecSys, and more!

Link: https://www.talkrl.com/episodes/sven-mika

TalkRL: The Reinforcement Learning Podcast

TalkRL: The Reinforcement Learning Podcast | Sven Mika

Sven Mika of Anyscale on RLlib present and future, Ray and Ray Summit 2022, applied RL in Games / Finance / RecSys, and more!

1.31K viewsLondon, 05:32

Reinforcement Learning Global

Today we’re announcing the Farama Foundation – a new nonprofit organization designed in part to house major existing open source reinforcement learning (“RL”) libraries in a neutral nonprofit body.

https://farama.org/Announcing-The-Farama-Foundation

The Farama Foundation

Announcing The Farama Foundation - The future of open source reinforcement learning

Today we’re announcing the Farama Foundation – a new nonprofit organization designed in part to house major existing open source reinforcement learning (“RL”) libraries in a neutral nonprofit body.

1.56K viewsLondon, 05:53

Reinforcement Learning Global

IQ-Learn: Inverse soft-Q Learning for Imitation

Link: https://div99.github.io/IQ-Learn/

div99.github.io

IQ-Learn: Inverse soft-Q Learning for Imitation

Introducing Inverse Q-Learning, a novel SOTA framework for simple, scalable and stable Imitation Learning.

1.53K viewsLondon, 11:40

Reinforcement Learning Global

Blog Post

Fully Autonomous Real-World Reinforcement Learning with Applications to Mobile Manipulation

Link: https://bair.berkeley.edu/blog/2023/01/20/relmm/

The Berkeley Artificial Intelligence Research Blog

Fully Autonomous Real-World Reinforcement Learning with Applications to Mobile Manipulation

1.81K viewsLondon, 10:42

Reinforcement Learning Global

Podcast: Rohin Shah

Dr. Rohin Shah is a Research Scientist at DeepMind, and the editor and main contributor of the Alignment Newsletter.

He talks about Value Alignment, Learning from Human feedback, Assistance paradigm, the BASALT MineRL competition, his Alignment Newsletter, and more!

Link: https://www.talkrl.com/episodes/rohin-shah

TalkRL: The Reinforcement Learning Podcast

TalkRL: The Reinforcement Learning Podcast | Rohin Shah

DeepMind Research Scientist Dr. Rohin Shah on Value Alignment, Learning from Human feedback, Assistance paradigm, the BASALT MineRL competition, his Alignment Newsletter, and more!

1.55K viewsLondon, 12:19

Reinforcement Learning Global

Paper: Deploying Offline Reinforcement Learning with Human Feedback

Tl;Dr: Reinforcement learning (RL) has shown promise for decision-making tasks in real-world applications. One practical framework involves training parameterized policy models from an offline dataset and subsequently deploying them in an online environment. However, this approach can be risky since the offline training may not be perfect, leading to poor performance of the RL models that may take dangerous actions. To address this issue, we propose an alternative framework that involves a human supervising the RL models and providing additional feedback in the online deployment phase. We formalize this online deployment problem and develop two approaches. The first approach uses model selection and the upper confidence bound algorithm to adaptively select a model to deploy from a candidate set of trained offline RL models. The second approach involves fine-tuning the model in the online deployment phase when a supervision signal arrives. We demonstrate the effectiveness of these approaches for robot locomotion control and traffic light control tasks through empirical validation.

Paper: https://arxiv.org/abs/2303.07046

1.29K viewsLondon, 07:40

Reinforcement Learning Global

Paper: Misspecification in Inverse Reinforcement Learning

Tl;Dr: The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function R from a policy π. To do this, we need a model of how π relates to R. In the current literature, the most common models are optimality, Boltzmann rationality, and causal entropy maximisation. One of the primary motivations behind IRL is to infer human preferences from human behaviour. However, the true relationship between human preferences and human behaviour is much more complex than any of the models currently used in IRL. This means that they are misspecified, which raises the worry that they might lead to unsound inferences if applied to real-world data. In this paper, we provide a mathematical analysis of how robust different IRL models are to misspecification, and answer precisely how the demonstrator policy may differ from each of the standard models before that model leads to faulty inferences about the reward function R. We also introduce a framework for reasoning about misspecification in IRL, together with formal tools that can be used to easily derive the misspecification robustness of new IRL models.

Paper: https://arxiv.org/abs/2212.03201

1.61K viewsLondon, 07:44

Reinforcement Learning Global

Podcast: Jakob Foerster

Jakob Foerster is an Associate Professor at University of Oxford.

He talks about Multi-Agent learning, Cooperation vs Competition, Emergent Communication, Zero-shot coordination, Opponent Shaping, agents for Hanabi and Prisoner's Dilemma, and more!

Link: https://www.talkrl.com/episodes/jakob-foerster

Jakob N. Foerster

Associate Professor, Department of Engineering Science, University of Oxford
Research Scientist, FAIR, Meta AI, Meta
Supernumeray Fellow, St Anne's College, Oxford
Lab Website (FLAIR)
Google scholar profile
Patents
Youtube channel
Twitter:@j_foerst /…

1.22K viewsLondon, edited 08:18

Reinforcement Learning Global

Podcast: Martin Riedmiller

Martin Riedmiller is a research scientist and team lead at DeepMind.

He talks about controlling nuclear fusion plasma in a tokamak with RL, the original Deep Q-Network, Neural Fitted Q-Iteration, Collect and Infer, AGI for control systems, and tons more!

Link: https://www.talkrl.com/episodes/martin-riedmiller

Martin Riedmiller

1.35K viewsLondon, 08:21

Reinforcement Learning Global

Blog : To keep doing RL research, stop calling yourself an RL researcher
by Pierluca D'Oro

Link: https://www.scienceofaiagents.com/p/to-keep-doing-rl-research-stop-calling

Scienceofaiagents

To keep doing RL research, stop calling yourself an RL researcher

On the role of RL researchers in the era of LLM agents.

921 viewsLondon, 12:58

Reinforcement Learning Global

Paper: Offline Actor-Critic Reinforcement Learning Scales to Large Models

Tl;Dr: We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset containing both sub-optimal and expert behavior on 132 continuous control tasks. We introduce a Perceiver-based actor-critic model and elucidate the key model features needed to make offline RL work with self- and cross-attention modules. Overall, we find that: i) simple offline actor critic algorithms are a natural choice for gradually moving away from the currently predominant paradigm of behavioral cloning, and ii) via offline RL it is possible to learn multi-task policies that master many domains simultaneously, including real robotics tasks, from sub-optimal demonstrations or self-generated data.

Link: https://arxiv.org/abs/2402.05546

Offline Actor-Critic Reinforcement Learning Scales to Large Models

We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline...

949 viewsLondon, 12:38

Reinforcement Learning Global

Paper: Mixtures of Experts Unlock Parameter Scaling for Deep RL

Tl;Dr: The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.

Link: https://arxiv.org/abs/2402.08609

Mixtures of Experts Unlock Parameter Scaling for Deep RL

The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws...

1.38K viewsLondon, 05:51

Reinforcement Learning Global

Podcast: Sharath Chandra Raparthy

Sharath Chandra Raparthy is an AI Resident at FAIR in Meta, and did his Master's at Mila.

He talks about In-Context Learning for Sequential Decision Tasks, GFlowNets, and more!

Link: https://www.talkrl.com/episodes/sharath-chandra-raparthy

TalkRL: The Reinforcement Learning Podcast

TalkRL: The Reinforcement Learning Podcast | Sharath Chandra Raparthy

Sharath Chandra Raparthy on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more! Sharath Chandra Raparthy is an AI Resident at FAIR at Meta, and did his Master's at Mila. Featu...

1.32K viewsLondon, edited 05:23

Reinforcement Learning Global

Paper: In deep reinforcement learning, a pruned network is a good network

Tl;Dr: Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional networks and exhibit a type of "scaling law", using only a small fraction of the full network parameters.

Link: https://arxiv.org/abs/2402.12479

In value-based deep reinforcement learning, a pruned network is a...

Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training...

1.8K viewsLondon, 16:43

Reinforcement Learning Global

In our newly published paper, we formulate a mean field game (MFG) to minimize Age of Information (AoI) by optimizing cruise control, and then we develop a novel solution based on Proximal Policy Optimization (PPO) to jointly optimize continuous and discrete actions. Specifically, UAV swarms are employed to collect time critical sensory data. Time-critical data collection is influenced by the velocity of the UAVs and their coordinated interactions in the swarms, which can be modeled using MFG. This raises the importance of an age-optimal cruise control based on MFG for UAVs. However, determining the equilibrium online is difficult in practical scenarios, and thus we propose, a new mean field hybrid proximal policy optimization (MF-HPPO) scheme to minimize the average AoI by optimizing the UAV’s trajectories and data collection scheduling of the ground sensors given mixed continuous and discrete actions. MF-HPPO highly reduces the complexity while minimizing the average AoI.
Please check out our paper for more information:
https://ieeexplore.ieee.org/abstract/document/10508811
Paper: https://arxiv.org/abs/2405.00056

Age of Information Minimization using Multi-agent UAVs based on...

Unmanned Aerial Vehicle (UAV) swarms play an effective role in timely data collection from ground sensors in remote and hostile areas. Optimizing the collective behavior of swarms can improve data...

1.88K viewsLondon, 12:18

Reinforcement Learning Global

https://www.ualberta.ca/en/folio/2025/03/computing-science-professor-wins-turing-award.html

University of Alberta

Computing science professor wins ‘Nobel Prize in computing’

Richard Sutton, a University of Alberta computing science professor and one of the founders of modern computational reinforcement learning, has been honoured as co-recipient of the 2024 Association for Computing Machinery A.M. Turing Award, often referred…

1.07K viewsLondon, 03:21

Reinforcement Learning Global

RLHF Book by Nathan Lambert
https://rlhfbook.com/

RLHF Book by Nathan Lambert

The Reinforcement Learning from Human Feedback Book

66 viewsLondon, 15:04

Reinforcement Learning Global

A short introduction to RLHF and post-training focused on language models

66 viewsLondon, 15:04