Causal Discovery with Reinforcement Learning
https://arxiv.org/abs/1906.04477
https://arxiv.org/abs/1906.04477
arXiv.org
Causal Discovery with Reinforcement Learning
Discovering causal structure among a set of variables is a fundamental problem in many empirical sciences. Traditional score-based casual discovery methods rely on various local heuristics to...
A causal framework for distribution generalization
https://arxiv.org/abs/2006.07433
https://arxiv.org/abs/2006.07433
In Search of Lost Domain Generalization
https://arxiv.org/abs/2007.01434
https://arxiv.org/abs/2007.01434
arXiv.org
In Search of Lost Domain Generalization
The goal of domain generalization algorithms is to predict well on distributions different from those seen during training. While a myriad of domain generalization algorithms exist,...
Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning
http://proceedings.mlr.press/v144/sonar21a.html
http://proceedings.mlr.press/v144/sonar21a.html
PMLR
Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning
A fundamental challenge in reinforcement learning is to learn policies that generalize beyond the operating domains experienced during training. In this pape...
Direct Advantage Estimation
https://arxiv.org/abs/2109.06093
links advantage function with causal effect as in rubin model
https://arxiv.org/abs/2109.06093
links advantage function with causal effect as in rubin model
#classics
This year nobel prize in economics went to the Angrist and Imbens
https://www.nobelprize.org/prizes/economic-sciences/2021/summary/
so if you did not read about instrumental variables yet, i guess you should
https://www.jstor.org/stable/2291629
This year nobel prize in economics went to the Angrist and Imbens
https://www.nobelprize.org/prizes/economic-sciences/2021/summary/
so if you did not read about instrumental variables yet, i guess you should
https://www.jstor.org/stable/2291629
NobelPrize.org
Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2021
The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2021 was divided, one half awarded to David Card "for his empirical contributions to labour economics", the other half jointly to Joshua D. Angrist and Guido W. Imbens "for their methodological…
Shaking the foundations: delusions in sequence models for interaction and control
https://arxiv.org/abs/2110.10819
https://arxiv.org/abs/2110.10819
Forwarded from Just links
👍1
hi there,
not a causality link, but still! we’ve got paper accepted at ICML2022 (Spotlight), so if you’re interested in offline rl — check it out
https://twitter.com/vladkurenkov/status/1534235675725381632
retweet appreciated 😈
not a causality link, but still! we’ve got paper accepted at ICML2022 (Spotlight), so if you’re interested in offline rl — check it out
https://twitter.com/vladkurenkov/status/1534235675725381632
retweet appreciated 😈
On Calibration and Out-of-domain Generalization
https://arxiv.org/abs/2102.10395
https://arxiv.org/abs/2102.10395
👍1
👋
we finally released our offline RL library with SOTA algorithms, so if you're into this stuff, check it out
- single-file implementations
- benchmarked on D4RL datasets
- wandb reports with full metric logs (so that you don't need to rely on final performance tables)
https://github.com/corl-team/CORL
we finally released our offline RL library with SOTA algorithms, so if you're into this stuff, check it out
- single-file implementations
- benchmarked on D4RL datasets
- wandb reports with full metric logs (so that you don't need to rely on final performance tables)
https://github.com/corl-team/CORL
GitHub
GitHub - corl-team/CORL: High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC…
High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC - corl-team/CORL
🔥7
Forwarded from Агенты ИИ | AGI_and_RL
A Survey on Causal Reinforcement Learning
https://arxiv.org/abs/2302.05209
10 Feb 2023
—-
While Reinforcement Learning (RL) achieves tremendous success in sequential decision-making problems of many domains, it still faces key challenges of data inefficiency and the lack of interpretability. Interestingly, many researchers have leveraged insights from the causality literature recently, bringing forth flourishing works to unify the merits of causality and address well the challenges from RL. As such, it is of great necessity and significance to collate these Causal Reinforcement Learning (CRL) works, offer a review of CRL methods, and investigate the potential functionality from causality toward RL. In particular, we divide existing CRL approaches into two categories according to whether their causality-based information is given in advance or not. We further analyze each category in terms of the formalization of different models, ranging from the Markov Decision Process (MDP), Partially Observed Markov Decision Process (POMDP), Multi-Arm Bandits (MAB), and Dynamic Treatment Regime (DTR). Moreover, we summarize the evaluation matrices and open sources while we discuss emerging applications, along with promising prospects for the future development of CRL.
https://arxiv.org/abs/2302.05209
10 Feb 2023
—-
While Reinforcement Learning (RL) achieves tremendous success in sequential decision-making problems of many domains, it still faces key challenges of data inefficiency and the lack of interpretability. Interestingly, many researchers have leveraged insights from the causality literature recently, bringing forth flourishing works to unify the merits of causality and address well the challenges from RL. As such, it is of great necessity and significance to collate these Causal Reinforcement Learning (CRL) works, offer a review of CRL methods, and investigate the potential functionality from causality toward RL. In particular, we divide existing CRL approaches into two categories according to whether their causality-based information is given in advance or not. We further analyze each category in terms of the formalization of different models, ranging from the Markov Decision Process (MDP), Partially Observed Markov Decision Process (POMDP), Multi-Arm Bandits (MAB), and Dynamic Treatment Regime (DTR). Moreover, we summarize the evaluation matrices and open sources while we discuss emerging applications, along with promising prospects for the future development of CRL.
🎉1
Forwarded from еба́ные идеи для резерча
Dutch Rudder as an Acyclic Causal Model
Reinforcement Learning from Passive Data via Latent Intentions
https://arxiv.org/abs/2304.04782
https://arxiv.org/abs/2304.04782
Survival Instinct in Offline Reinforcement Learning
https://arxiv.org/abs/2306.03286
https://arxiv.org/abs/2306.03286
❤1🔥1