https://isitablog.com/the-crucial-role-of-data-diversity-in-training-large-language-models/