#Tensorflow_Graphics #GoogleAI
Check out how Computer Graphics meets Deep Learning!
https://medium.com/tensorflow/introducing-tensorflow-graphics-computer-graphics-meets-deep-learning-c8e3877b7668?_branch_match_id=650042266723832991
Check out how Computer Graphics meets Deep Learning!
https://medium.com/tensorflow/introducing-tensorflow-graphics-computer-graphics-meets-deep-learning-c8e3877b7668?_branch_match_id=650042266723832991
Medium
Introducing TensorFlow Graphics: Computer Graphics Meets Deep Learning
Posted by Julien Valentin and Sofien Bouaziz
Google AI announced the release of the Google Research Football Environment, a novel RL environment where agents aim to master the world’s most popular sport—football.
#AIforFootball #GoogleAI
https://ai.googleblog.com/2019/06/introducing-google-research-football.html
#AIforFootball #GoogleAI
https://ai.googleblog.com/2019/06/introducing-google-research-football.html
research.google
Introducing Google Research Football: A Novel Reinforcement Learning Environment
Posted by Karol Kurach, Research Lead and Olivier Bachem, Research Scientist, Google Research, Zürich The goal of reinforcement learning (RL) is ...
Something really really cool!🙂
#weekend_read
Paper-Title: Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real #GoogleAI
Link to the paper: https://arxiv.org/pdf/1908.05224.pdf
Link to the videos: https://sites.google.com/view/manipulation-via-locomotion
TL;DR: They have presented successful zero-shot transfer of policies trained in simulation to perform difficult locomotion and manipulation via locomotion tasks. The key to their method is the imposition of hierarchy, which introduces modularity into the domain randomization process and enables the learning of increasingly complex behaviours.
#weekend_read
Paper-Title: Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real #GoogleAI
Link to the paper: https://arxiv.org/pdf/1908.05224.pdf
Link to the videos: https://sites.google.com/view/manipulation-via-locomotion
TL;DR: They have presented successful zero-shot transfer of policies trained in simulation to perform difficult locomotion and manipulation via locomotion tasks. The key to their method is the imposition of hierarchy, which introduces modularity into the domain randomization process and enables the learning of increasingly complex behaviours.
Google
Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real
#weekend_read
Check out this mind-blowing survey on HRL with some really-strong proven hypothesis. Maybe the best till date!
Paper-Title: WHY DOES HIERARCHY (SOMETIMES) WORK SO WELL IN REINFORCEMENT LEARNING?
Link to the paper: https://arxiv.org/pdf/1909.10618.pdf
#GoogleAI #UCB
Four Hypothesis: The four hypotheses may also be categorized as hierarchical training (H1 and H3) and hierarchical exploration (H2 and H4).
(H1) Temporally extended training. High-level actions correspond to multiple environment steps. To the high-level agent, episodes are effectively shorter. Thus, rewards are propagated faster and learning should improve.
(H2) Temporally extended exploration. Since high-level actions correspond to multiple environment steps, exploration in the high-level is mapped to environment exploration which is temporally correlated across steps. This way, an HRL agent explores the environment more efficiently. As a motivating example, the distribution associated with a random (Gaussian) walk is wider when the random noise is temporally correlated.
(H3) Semantic training. High-level actor and critic networks are trained with respect to semantically meaningful actions. These semantic actions are more correlated with future values, and thus easier to learn, compared to training with respect to the atomic actions of the environment. For example, in a robot navigation task it is easier to learn future values with respect to deltas in x-y coordinates rather than robot joint torques.
(H4) Semantic exploration. Exploration strategies (in the simplest case, random action noise) are applied to semantically meaningful actions and are thus more meaningful than the same strategies would be if applied to the atomic actions of the environment. For example, in a robot navigation task, it intuitively makes more sense to explore at the level of x-y coordinates rather than robot joint torques.
TL;DR: A large number of conclusions can be drawn based on empirical analysis. Here are few:-
In terms of the benefits of training, it is clear that training with respect to semantically meaningful abstract actions (H3) has a negligible effect on the success of HRL.
Moreover, temporally extended training (H1) is only important insofar as it enables the use of multi-step rewards, as opposed to training with respect to temporally extended actions.
The main and arguably most surprising, the benefit of the hierarchy is due to exploration. This is evidenced by the fact that temporally extended goal-reaching and agent-switching can enable non-hierarchical agents to solve tasks that otherwise can only be solved.
These results suggest that the empirical effectiveness of hierarchical agents simply reflects the improved exploration that these agents can attain.
Check out this mind-blowing survey on HRL with some really-strong proven hypothesis. Maybe the best till date!
Paper-Title: WHY DOES HIERARCHY (SOMETIMES) WORK SO WELL IN REINFORCEMENT LEARNING?
Link to the paper: https://arxiv.org/pdf/1909.10618.pdf
#GoogleAI #UCB
Four Hypothesis: The four hypotheses may also be categorized as hierarchical training (H1 and H3) and hierarchical exploration (H2 and H4).
(H1) Temporally extended training. High-level actions correspond to multiple environment steps. To the high-level agent, episodes are effectively shorter. Thus, rewards are propagated faster and learning should improve.
(H2) Temporally extended exploration. Since high-level actions correspond to multiple environment steps, exploration in the high-level is mapped to environment exploration which is temporally correlated across steps. This way, an HRL agent explores the environment more efficiently. As a motivating example, the distribution associated with a random (Gaussian) walk is wider when the random noise is temporally correlated.
(H3) Semantic training. High-level actor and critic networks are trained with respect to semantically meaningful actions. These semantic actions are more correlated with future values, and thus easier to learn, compared to training with respect to the atomic actions of the environment. For example, in a robot navigation task it is easier to learn future values with respect to deltas in x-y coordinates rather than robot joint torques.
(H4) Semantic exploration. Exploration strategies (in the simplest case, random action noise) are applied to semantically meaningful actions and are thus more meaningful than the same strategies would be if applied to the atomic actions of the environment. For example, in a robot navigation task, it intuitively makes more sense to explore at the level of x-y coordinates rather than robot joint torques.
TL;DR: A large number of conclusions can be drawn based on empirical analysis. Here are few:-
In terms of the benefits of training, it is clear that training with respect to semantically meaningful abstract actions (H3) has a negligible effect on the success of HRL.
Moreover, temporally extended training (H1) is only important insofar as it enables the use of multi-step rewards, as opposed to training with respect to temporally extended actions.
The main and arguably most surprising, the benefit of the hierarchy is due to exploration. This is evidenced by the fact that temporally extended goal-reaching and agent-switching can enable non-hierarchical agents to solve tasks that otherwise can only be solved.
These results suggest that the empirical effectiveness of hierarchical agents simply reflects the improved exploration that these agents can attain.