#weekend_read
Paper-Title: Safe Reinforcement Learning with Scene Decomposition for Navigating Complex Urban Environments
Link to the paper: https://arxiv.org/pdf/1904.11483.pdf
#Stanford #HRI
TL;DR: [1] They presented a decision-making framework for autonomously navigating urban intersections.
[2] Secondly, they introduced a learned belief updater that uses an ensemble of RNNs to estimate the location of vehicles behind obstacles and is robust to perception errors.
[3] Further they improved upon pure reinforcement learning methods by using a model checker to enforce safety guarantees.
[4] Finally, through a scene decomposition method they demonstrated how to efficiently scale the algorithm to scenarios with multiple cars and pedestrians.
Paper-Title: Safe Reinforcement Learning with Scene Decomposition for Navigating Complex Urban Environments
Link to the paper: https://arxiv.org/pdf/1904.11483.pdf
#Stanford #HRI
TL;DR: [1] They presented a decision-making framework for autonomously navigating urban intersections.
[2] Secondly, they introduced a learned belief updater that uses an ensemble of RNNs to estimate the location of vehicles behind obstacles and is robust to perception errors.
[3] Further they improved upon pure reinforcement learning methods by using a model checker to enforce safety guarantees.
[4] Finally, through a scene decomposition method they demonstrated how to efficiently scale the algorithm to scenarios with multiple cars and pedestrians.
#weekend_read
Paper-Title: Reinforcement Learning, Fast and Slow
#Deepmind #Cognitive_Science
Link to the paper: https://www.cell.com/action/showPdf?pii=S1364-6613%2819%2930061-0
TL;DR: [1] This paper reviews recent techniques in deep RL that narrow the gap in learning speed between humans and agents, & demonstrate an interplay between fast and slow learning w/ parallels in animal/human cognition.
[2] When episodic memory is used in reinforcement learning, an explicit record of past events is maintained for making decisions about the current situation. The action chosen is the one associated with the highest value, based on the outcomes of similar past situations.
[3] Meta-reinforcement learning quickly adapts to new tasks by learning strong inductive biases. This is done via a slower outer learning loop training on the distribution of tasks, leading to an inner loop that rapidly adapts by maintaining a history of past actions and observations.
Paper-Title: Reinforcement Learning, Fast and Slow
#Deepmind #Cognitive_Science
Link to the paper: https://www.cell.com/action/showPdf?pii=S1364-6613%2819%2930061-0
TL;DR: [1] This paper reviews recent techniques in deep RL that narrow the gap in learning speed between humans and agents, & demonstrate an interplay between fast and slow learning w/ parallels in animal/human cognition.
[2] When episodic memory is used in reinforcement learning, an explicit record of past events is maintained for making decisions about the current situation. The action chosen is the one associated with the highest value, based on the outcomes of similar past situations.
[3] Meta-reinforcement learning quickly adapts to new tasks by learning strong inductive biases. This is done via a slower outer learning loop training on the distribution of tasks, leading to an inner loop that rapidly adapts by maintaining a history of past actions and observations.
Something really really cool!🙂
#weekend_read
Paper-Title: Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real #GoogleAI
Link to the paper: https://arxiv.org/pdf/1908.05224.pdf
Link to the videos: https://sites.google.com/view/manipulation-via-locomotion
TL;DR: They have presented successful zero-shot transfer of policies trained in simulation to perform difficult locomotion and manipulation via locomotion tasks. The key to their method is the imposition of hierarchy, which introduces modularity into the domain randomization process and enables the learning of increasingly complex behaviours.
#weekend_read
Paper-Title: Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real #GoogleAI
Link to the paper: https://arxiv.org/pdf/1908.05224.pdf
Link to the videos: https://sites.google.com/view/manipulation-via-locomotion
TL;DR: They have presented successful zero-shot transfer of policies trained in simulation to perform difficult locomotion and manipulation via locomotion tasks. The key to their method is the imposition of hierarchy, which introduces modularity into the domain randomization process and enables the learning of increasingly complex behaviours.
Google
Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real
#weekend_read
Check out this mind-blowing survey on HRL with some really-strong proven hypothesis. Maybe the best till date!
Paper-Title: WHY DOES HIERARCHY (SOMETIMES) WORK SO WELL IN REINFORCEMENT LEARNING?
Link to the paper: https://arxiv.org/pdf/1909.10618.pdf
#GoogleAI #UCB
Four Hypothesis: The four hypotheses may also be categorized as hierarchical training (H1 and H3) and hierarchical exploration (H2 and H4).
(H1) Temporally extended training. High-level actions correspond to multiple environment steps. To the high-level agent, episodes are effectively shorter. Thus, rewards are propagated faster and learning should improve.
(H2) Temporally extended exploration. Since high-level actions correspond to multiple environment steps, exploration in the high-level is mapped to environment exploration which is temporally correlated across steps. This way, an HRL agent explores the environment more efficiently. As a motivating example, the distribution associated with a random (Gaussian) walk is wider when the random noise is temporally correlated.
(H3) Semantic training. High-level actor and critic networks are trained with respect to semantically meaningful actions. These semantic actions are more correlated with future values, and thus easier to learn, compared to training with respect to the atomic actions of the environment. For example, in a robot navigation task it is easier to learn future values with respect to deltas in x-y coordinates rather than robot joint torques.
(H4) Semantic exploration. Exploration strategies (in the simplest case, random action noise) are applied to semantically meaningful actions and are thus more meaningful than the same strategies would be if applied to the atomic actions of the environment. For example, in a robot navigation task, it intuitively makes more sense to explore at the level of x-y coordinates rather than robot joint torques.
TL;DR: A large number of conclusions can be drawn based on empirical analysis. Here are few:-
In terms of the benefits of training, it is clear that training with respect to semantically meaningful abstract actions (H3) has a negligible effect on the success of HRL.
Moreover, temporally extended training (H1) is only important insofar as it enables the use of multi-step rewards, as opposed to training with respect to temporally extended actions.
The main and arguably most surprising, the benefit of the hierarchy is due to exploration. This is evidenced by the fact that temporally extended goal-reaching and agent-switching can enable non-hierarchical agents to solve tasks that otherwise can only be solved.
These results suggest that the empirical effectiveness of hierarchical agents simply reflects the improved exploration that these agents can attain.
Check out this mind-blowing survey on HRL with some really-strong proven hypothesis. Maybe the best till date!
Paper-Title: WHY DOES HIERARCHY (SOMETIMES) WORK SO WELL IN REINFORCEMENT LEARNING?
Link to the paper: https://arxiv.org/pdf/1909.10618.pdf
#GoogleAI #UCB
Four Hypothesis: The four hypotheses may also be categorized as hierarchical training (H1 and H3) and hierarchical exploration (H2 and H4).
(H1) Temporally extended training. High-level actions correspond to multiple environment steps. To the high-level agent, episodes are effectively shorter. Thus, rewards are propagated faster and learning should improve.
(H2) Temporally extended exploration. Since high-level actions correspond to multiple environment steps, exploration in the high-level is mapped to environment exploration which is temporally correlated across steps. This way, an HRL agent explores the environment more efficiently. As a motivating example, the distribution associated with a random (Gaussian) walk is wider when the random noise is temporally correlated.
(H3) Semantic training. High-level actor and critic networks are trained with respect to semantically meaningful actions. These semantic actions are more correlated with future values, and thus easier to learn, compared to training with respect to the atomic actions of the environment. For example, in a robot navigation task it is easier to learn future values with respect to deltas in x-y coordinates rather than robot joint torques.
(H4) Semantic exploration. Exploration strategies (in the simplest case, random action noise) are applied to semantically meaningful actions and are thus more meaningful than the same strategies would be if applied to the atomic actions of the environment. For example, in a robot navigation task, it intuitively makes more sense to explore at the level of x-y coordinates rather than robot joint torques.
TL;DR: A large number of conclusions can be drawn based on empirical analysis. Here are few:-
In terms of the benefits of training, it is clear that training with respect to semantically meaningful abstract actions (H3) has a negligible effect on the success of HRL.
Moreover, temporally extended training (H1) is only important insofar as it enables the use of multi-step rewards, as opposed to training with respect to temporally extended actions.
The main and arguably most surprising, the benefit of the hierarchy is due to exploration. This is evidenced by the fact that temporally extended goal-reaching and agent-switching can enable non-hierarchical agents to solve tasks that otherwise can only be solved.
These results suggest that the empirical effectiveness of hierarchical agents simply reflects the improved exploration that these agents can attain.