https://hackershare.dev/bookmarks/745479
How to train large models on many GPUs?