ββLearning to Simulate Dynamic Environments with GameGAN
#Nvidia designed a GAN that able to recreate games without any game engine. To train it, authors of the model use experience collected by reinforcement learning and other techniques.
GameGAN successfully reconstructed all mechanics of #Pacman game. Moreover, the trained model can generate new mazes that have never appeared in the original game. It can even replace background (static objects) and foreground (dynamic objects) with different images!
As the authors say, applying reinforcement learning algorithms to real world tasks requires accurate simulation of that task. Currently designing such simulations is expensive and time-consuming. Using neural networks instead of hand-written simulations may help to solve these problems.
Paper: https://cdn.arstechnica.net/wp-content/uploads/2020/05/Nvidia_GameGAN_Research.pdf
Blog: https://blogs.nvidia.com/blog/2020/05/22/gamegan-research-pacman-anniversary/
Github Page: https://nv-tlabs.github.io/gameGAN/
#GAN #RL
#Nvidia designed a GAN that able to recreate games without any game engine. To train it, authors of the model use experience collected by reinforcement learning and other techniques.
GameGAN successfully reconstructed all mechanics of #Pacman game. Moreover, the trained model can generate new mazes that have never appeared in the original game. It can even replace background (static objects) and foreground (dynamic objects) with different images!
As the authors say, applying reinforcement learning algorithms to real world tasks requires accurate simulation of that task. Currently designing such simulations is expensive and time-consuming. Using neural networks instead of hand-written simulations may help to solve these problems.
Paper: https://cdn.arstechnica.net/wp-content/uploads/2020/05/Nvidia_GameGAN_Research.pdf
Blog: https://blogs.nvidia.com/blog/2020/05/22/gamegan-research-pacman-anniversary/
Github Page: https://nv-tlabs.github.io/gameGAN/
#GAN #RL
ββThorough analysis of recent Tesla Model 3 accident and warning to autopilot users
Olga Uskova shared insights of her #CognitivePilot team members on #Tesla accident.
Highlights:
- Please donβt use autopilot on highways. They are still buggy and in development
- Obvious GTA-emulator training might have not been done to reach satisfactory results
- Tesla might have not been updating stereo cams + radar cooperation logic due to termination of contract with Mobileye EyeQ3
Analysis: https://www.facebook.com/uskova.oa/videos/804398560090702/
Article: https://www.thedrive.com/news/33789/autopilot-blamed-for-teslas-crash-into-overturned-truck
#autonomousdriving #selfdriving #RL #cars
Olga Uskova shared insights of her #CognitivePilot team members on #Tesla accident.
Highlights:
- Please donβt use autopilot on highways. They are still buggy and in development
- Obvious GTA-emulator training might have not been done to reach satisfactory results
- Tesla might have not been updating stereo cams + radar cooperation logic due to termination of contract with Mobileye EyeQ3
Analysis: https://www.facebook.com/uskova.oa/videos/804398560090702/
Article: https://www.thedrive.com/news/33789/autopilot-blamed-for-teslas-crash-into-overturned-truck
#autonomousdriving #selfdriving #RL #cars
How Tesla Truck design affects design of all autonomous vehicles
#design #selfdriving #autonomousvehicle #rl #scania
#design #selfdriving #autonomousvehicle #rl #scania
Data Science by ODS.ai π¦
π€ The NetHack Learning Environment #Facebook launched new Reinforcement Learning environment for training agents based on #NetHack game. Nethack has nothing to do with what is considered common cybersecurity now, but it is an early terminal-based Minecraftβ¦
Update from #Facebook on #Nethack learning Environment.
Link: https://ai.facebook.com/blog/nethack-learning-environment-to-advance-deep-reinforcement-learning
Publication: https://arxiv.org/abs/2006.13760
#RL
Link: https://ai.facebook.com/blog/nethack-learning-environment-to-advance-deep-reinforcement-learning
Publication: https://arxiv.org/abs/2006.13760
#RL
ββSalesforce opensourced AI-framework for economic RL
AI Economist is capable of learning dynamic tax policies that optimize equality along with productivity in simulated economies, outperforming alternative tax systems.
Github: https://github.com/salesforce/ai-economist
Blog post with results: https://blog.einstein.ai/the-ai-economist/
Blog post with release: https://blog.einstein.ai/the-ai-economist-moonshot/
#Salesforce #gym #RL #economics #AIEconomics #animalcrossing #AIEconomist
AI Economist is capable of learning dynamic tax policies that optimize equality along with productivity in simulated economies, outperforming alternative tax systems.
Github: https://github.com/salesforce/ai-economist
Blog post with results: https://blog.einstein.ai/the-ai-economist/
Blog post with release: https://blog.einstein.ai/the-ai-economist-moonshot/
#Salesforce #gym #RL #economics #AIEconomics #animalcrossing #AIEconomist
YouTube
Introducing the AI Economist
See how Salesforce Research is using AI to drive positive, social change, with the AI Economist.
ββlearning to summarize from human feedback
by openai
the authors collect a high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary & use that model as a reward function to fine-tune a summarization policy using reinforcement learning. they apply this method to a version of the tl;dr dataset of reddit posts & find that their models significantly outperform both human reference summaries & much larger models fine-tuned with supervised learning alone
the researchers focused on english text summarization, as itβs a challenging problem where the notion of what makes a "good summary" is difficult to capture without human input
these models also transfer to cnn/dm news articles, producing summaries nearly as good as the human reference without any news-specific fine-tuning. furthermore, they conduct extensive analyses to understand the human feedback dataset & fine-tuned models. they establish that their reward model generalizes to a new dataset & that optimizing their reward model results in better summaries than optimizing rouge according to humans
blogpost: https://openai.com/blog/learning-to-summarize-with-human-feedback/
paper: https://arxiv.org/abs/2009.01325
code: https://github.com/openai/summarize-from-feedback
#nlp #rl #summarize
by openai
the authors collect a high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary & use that model as a reward function to fine-tune a summarization policy using reinforcement learning. they apply this method to a version of the tl;dr dataset of reddit posts & find that their models significantly outperform both human reference summaries & much larger models fine-tuned with supervised learning alone
the researchers focused on english text summarization, as itβs a challenging problem where the notion of what makes a "good summary" is difficult to capture without human input
these models also transfer to cnn/dm news articles, producing summaries nearly as good as the human reference without any news-specific fine-tuning. furthermore, they conduct extensive analyses to understand the human feedback dataset & fine-tuned models. they establish that their reward model generalizes to a new dataset & that optimizing their reward model results in better summaries than optimizing rouge according to humans
blogpost: https://openai.com/blog/learning-to-summarize-with-human-feedback/
paper: https://arxiv.org/abs/2009.01325
code: https://github.com/openai/summarize-from-feedback
#nlp #rl #summarize
ββFord is reported to work on a two-leg human-shaped delivery robot
#RL #bostondynamics #delivery #autonomousrobots
#RL #bostondynamics #delivery #autonomousrobots
ββWaymo started driverless tests in Phoenix
This #Google company plans to expand tests to cover whole state later.
Blog: https://blog.waymo.com/2020/10/waymo-is-opening-its-fully-driverless.html
Redditersβ experience: https://www.reddit.com/r/waymo/comments/j7rphd/4_minute_full_video_in_waymo_one_no_driver_short/
#autonomousrobots #selfdriving #rl #DL
This #Google company plans to expand tests to cover whole state later.
Blog: https://blog.waymo.com/2020/10/waymo-is-opening-its-fully-driverless.html
Redditersβ experience: https://www.reddit.com/r/waymo/comments/j7rphd/4_minute_full_video_in_waymo_one_no_driver_short/
#autonomousrobots #selfdriving #rl #DL
ββQVMix and QVMix-Max: Extending the Deep Quality-Value Family of Algorithms to Cooperative Multi-Agent Reinforcement Learning
Paper extends the Deep Quality-Value (DQV) family of al-
gorithms to multi-agent reinforcement learning and outperforms #SOTA
ArXiV: https://arxiv.org/abs/2012.12062
#DQV #RL #Starcraft
Paper extends the Deep Quality-Value (DQV) family of al-
gorithms to multi-agent reinforcement learning and outperforms #SOTA
ArXiV: https://arxiv.org/abs/2012.12062
#DQV #RL #Starcraft
This media is not supported in your browser
VIEW IN TELEGRAM
Habitat 2.0: Training home assistant robots with faster simulation and new benchmarks
Facebook released a new simulation platform to train robots in. Yeah, virtual robots in virtual environment, which can be a real space replica. This work brings us closer to domestic use of assistive robots.
Project website: https://ai.facebook.com/blog/habitat-20-training-home-assistant-robots-with-faster-simulation-and-new-benchmarks
Paper: https://ai.facebook.com/research/publications/habitat-2.0-training-home-assistants-to-rearrange-their-habitat
#Facebook #DigitalTwin #VR #RL #assistiverobots
Facebook released a new simulation platform to train robots in. Yeah, virtual robots in virtual environment, which can be a real space replica. This work brings us closer to domestic use of assistive robots.
Project website: https://ai.facebook.com/blog/habitat-20-training-home-assistant-robots-with-faster-simulation-and-new-benchmarks
Paper: https://ai.facebook.com/research/publications/habitat-2.0-training-home-assistants-to-rearrange-their-habitat
#Facebook #DigitalTwin #VR #RL #assistiverobots
Mava: a scalable, research framework for multi-agent reinforcement learning
The framework integrates with popular MARL environments such as PettingZoo, SMAC, RoboCup, OpenSpiel, Flatland , as well as a few custom environments.
Mava includes distributed implementations of multi-agent versions of ddpg, d4pg, dqn, ppo, as well as DIAL, VDN and QMIX.
ArXiV: https://arxiv.org/pdf/2107.01460.pdf
GitHub: https://github.com/instadeepai/Mava
#MARL #RL #dl
The framework integrates with popular MARL environments such as PettingZoo, SMAC, RoboCup, OpenSpiel, Flatland , as well as a few custom environments.
Mava includes distributed implementations of multi-agent versions of ddpg, d4pg, dqn, ppo, as well as DIAL, VDN and QMIX.
ArXiV: https://arxiv.org/pdf/2107.01460.pdf
GitHub: https://github.com/instadeepai/Mava
#MARL #RL #dl
GitHub
GitHub - instadeepai/Mava: π¦ A research-friendly codebase for fast experimentation of multi-agent reinforcement learning in JAX
π¦ A research-friendly codebase for fast experimentation of multi-agent reinforcement learning in JAX - instadeepai/Mava
Acquisition of Chess Knowledge in AlphaZero
69-pages paper of analysis how #AlphaZero plays chess. TLDR: lots of concepts self-learned by neural network can be mapped to human concepts.
This means that generally speaking we can train neural networks to do some task and then learn something from them. Opposite is also true: we might imagine teaching neural networks some human concepts in order to maek them more efficient.
Post: https://en.chessbase.com/post/acquisition-of-chess-knowledge-in-alphazero
Paper: https://arxiv.org/pdf/2111.09259.pdf
#RL
69-pages paper of analysis how #AlphaZero plays chess. TLDR: lots of concepts self-learned by neural network can be mapped to human concepts.
This means that generally speaking we can train neural networks to do some task and then learn something from them. Opposite is also true: we might imagine teaching neural networks some human concepts in order to maek them more efficient.
Post: https://en.chessbase.com/post/acquisition-of-chess-knowledge-in-alphazero
Paper: https://arxiv.org/pdf/2111.09259.pdf
#RL
Chess News
Acquisition of Chess Knowledge in AlphaZero
Researchers at DeepMind and Google Brain, in collaboration with Grandmaster Vladimir Kramnik, are working to explore what chess can teach us about AI and vice versa. Using Chessbaseβs extensive historical chess data along with the AlphaZero neural networkβ¦
π¦ Hi!
We are the first Telegram Data Science channel.
Channel was started as a collection of notable papers, news and releases shared for the members of Open Data Science (ODS) community. Through the years of just keeping the thing going we grew to an independent online Media supporting principles of Free and Open access to the information related to Data Science.
Ultimate Posts
* Where to start learning more about Data Science. https://github.com/open-data-science/ultimate_posts/tree/master/where_to_start
* @opendatascience channel audience research. https://github.com/open-data-science/ods_channel_stats_eda
Open Data Science
ODS.ai is an international community of people anyhow related to Data Science.
Website: https://ods.ai
Hashtags
Through the years we accumulated a big collection of materials, most of them accompanied by hashtags.
#deeplearning #DL β post about deep neural networks (> 1 layer)
#cv β posts related to Computer Vision. Pictures and videos
#nlp #nlu β Natural Language Processing and Natural Language Understanding. Texts and sequences
#audiolearning #speechrecognition β related to audio information processing
#ar β augmeneted reality related content
#rl β Reinforcement Learning (agents, bots and neural networks capable of playing games)
#gan #generation #generatinveart #neuralart β about neural artt and image generation
#transformer #vqgan #vae #bert #clip #StyleGAN2 #Unet #resnet #keras #Pytorch #GPT3 #GPT2 β related to special architectures or frameworks
#coding #CS β content related to software engineering sphere
#OpenAI #microsoft #Github #DeepMind #Yandex #Google #Facebook #huggingface β hashtags related to certain companies
#productionml #sota #recommendation #embeddings #selfdriving #dataset #opensource #analytics #statistics #attention #machine #translation #visualization
Chats
- Data Science Chat https://t.me/datascience_chat
- ODS Slack through invite form at website
ODS resources
* Main website: https://ods.ai
* ODS Community Telegram Channel (in Russian): @ods_ru
* ML trainings Telegram Channel: @mltrainings
* ODS Community Twitter: https://twitter.com/ods_ai
Feedback and Contacts
You are welcome to reach administration through telegram bot: @opendatasciencebot
We are the first Telegram Data Science channel.
Channel was started as a collection of notable papers, news and releases shared for the members of Open Data Science (ODS) community. Through the years of just keeping the thing going we grew to an independent online Media supporting principles of Free and Open access to the information related to Data Science.
Ultimate Posts
* Where to start learning more about Data Science. https://github.com/open-data-science/ultimate_posts/tree/master/where_to_start
* @opendatascience channel audience research. https://github.com/open-data-science/ods_channel_stats_eda
Open Data Science
ODS.ai is an international community of people anyhow related to Data Science.
Website: https://ods.ai
Hashtags
Through the years we accumulated a big collection of materials, most of them accompanied by hashtags.
#deeplearning #DL β post about deep neural networks (> 1 layer)
#cv β posts related to Computer Vision. Pictures and videos
#nlp #nlu β Natural Language Processing and Natural Language Understanding. Texts and sequences
#audiolearning #speechrecognition β related to audio information processing
#ar β augmeneted reality related content
#rl β Reinforcement Learning (agents, bots and neural networks capable of playing games)
#gan #generation #generatinveart #neuralart β about neural artt and image generation
#transformer #vqgan #vae #bert #clip #StyleGAN2 #Unet #resnet #keras #Pytorch #GPT3 #GPT2 β related to special architectures or frameworks
#coding #CS β content related to software engineering sphere
#OpenAI #microsoft #Github #DeepMind #Yandex #Google #Facebook #huggingface β hashtags related to certain companies
#productionml #sota #recommendation #embeddings #selfdriving #dataset #opensource #analytics #statistics #attention #machine #translation #visualization
Chats
- Data Science Chat https://t.me/datascience_chat
- ODS Slack through invite form at website
ODS resources
* Main website: https://ods.ai
* ODS Community Telegram Channel (in Russian): @ods_ru
* ML trainings Telegram Channel: @mltrainings
* ODS Community Twitter: https://twitter.com/ods_ai
Feedback and Contacts
You are welcome to reach administration through telegram bot: @opendatasciencebot
GitHub
ultimate_posts/where_to_start at master Β· open-data-science/ultimate_posts
Ultimate posts for opendatascience telegram channel - open-data-science/ultimate_posts
CORL: Offline Reinforcement Learning Library
Offline RL is a fresh paradigm that makes RL similar to the supervised learning, thus making it better applicable to the real-world problems. There is a whole bunch of recently developed Offline RL aglorithms, however, there was nots many of reliable open-sourced implementations for such algorithms
Our friends from @tinkoffai do some research in this direction and they recently open-sourced their internal offline RL library.
The main features are:
- Single-file implementations
- SOTA algorithms (Decision Transformer, AWAC, BC, CQL, IQL, TD3+BC, SAC-N, EDAC)
- Benchmarked on widely used D4RL datasets (results match performances reported in the original papers)
- Wandb logs for all of the experiments
Hope you will like it and the whole new world of Offline RL!
Github: https://github.com/tinkoff-ai/CORL
#tinkoff #RL #offline_lib
Offline RL is a fresh paradigm that makes RL similar to the supervised learning, thus making it better applicable to the real-world problems. There is a whole bunch of recently developed Offline RL aglorithms, however, there was nots many of reliable open-sourced implementations for such algorithms
Our friends from @tinkoffai do some research in this direction and they recently open-sourced their internal offline RL library.
The main features are:
- Single-file implementations
- SOTA algorithms (Decision Transformer, AWAC, BC, CQL, IQL, TD3+BC, SAC-N, EDAC)
- Benchmarked on widely used D4RL datasets (results match performances reported in the original papers)
- Wandb logs for all of the experiments
Hope you will like it and the whole new world of Offline RL!
Github: https://github.com/tinkoff-ai/CORL
#tinkoff #RL #offline_lib
GitHub
GitHub - tinkoff-ai/CORL: High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BCβ¦
High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC - tinkoff-ai/CORL