Data Science by ODS.ai 🦜

Amos: An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale

Amos is a new optimizer that we propose to pre-train large language models. It is more efficient and converges faster than AdamW: ≤ 51% memory for slot variables, and better valid loss within ≤ 70% training time!Amos is a new optimizer that we propose to pre-train large language models. It is more efficient and converges faster than AdamW: ≤ 51% memory for slot variables, and better valid loss within ≤ 70% training time!

ArXiV: https://arxiv.org/abs/2210.11693

#NLU #NLP #optimizer

21.5K views08:05

About

Blog

Apps

Platform