#tabular
Отличный английский и крайне интересный взгляд на особенности табличных данных от заслуженного кэггл гроссмейстера.
Боян этот, кстати, на моей памяти первый лектор кто указывает, что бустинги плохи в моделировании линейных зависимостей - то, с чем я сам сталкивался недавно.
https://youtu.be/OcNBmilICgY?si=ozjCKNLNHFgiOqP6
Отличный английский и крайне интересный взгляд на особенности табличных данных от заслуженного кэггл гроссмейстера.
Боян этот, кстати, на моей памяти первый лектор кто указывает, что бустинги плохи в моделировании линейных зависимостей - то, с чем я сам сталкивался недавно.
https://youtu.be/OcNBmilICgY?si=ozjCKNLNHFgiOqP6
YouTube
The Past, Present and the Future of Machine Learning for Tabular Data - Bojan Tunguz
In this talk, you will learn about the history and future directions of machine learning for Tabular Data. The talk covers
• What is Tabular Data?
• The Main Issues with Tabular Data
• Deep Learning vs. Gradient Boosted Trees
• Research Directions
• GPU…
• What is Tabular Data?
• The Main Issues with Tabular Data
• Deep Learning vs. Gradient Boosted Trees
• Research Directions
• GPU…
#tabular #anns #trees
Любопытная попытка объяснить известный феномен.
"According to Grinsztajn et. al (2022)4, tree-based methods work well for tabular data because they are not rotational invariant. In tabular data, the feature columns are often individually meaningful, and mixing them with other columns by rotating them is a disadvantage. An MLP first has to learn the right rotation and therefore has a more difficult task.
Sparse solutions: rotationally invariant models have a hard time distinguishing relevant and irrelevant features. Trees and forests are good at separating relevant and irrelevant and offer sparser solutions.
https://mindfulmodeler.substack.com/p/inductive-biases-of-the-random-forest
Любопытная попытка объяснить известный феномен.
"According to Grinsztajn et. al (2022)4, tree-based methods work well for tabular data because they are not rotational invariant. In tabular data, the feature columns are often individually meaningful, and mixing them with other columns by rotating them is a disadvantage. An MLP first has to learn the right rotation and therefore has a more difficult task.
Sparse solutions: rotationally invariant models have a hard time distinguishing relevant and irrelevant features. Trees and forests are good at separating relevant and irrelevant and offer sparser solutions.
https://mindfulmodeler.substack.com/p/inductive-biases-of-the-random-forest
Mindful Modeler
Inductive biases of the Random Forest and their consequences
part 4 of the inductive bias series
Forwarded from Artem Ryblov’s Data Science Weekly
The Kaggle Book by Konrad Banachewicz and Luca Massaron
Millions of data enthusiasts from around the world compete on Kaggle, the most famous data science competition platform of them all. Participating in Kaggle competitions is a surefire way to improve your data analysis skills, network with an amazing community of data scientists, and gain valuable experience to help grow your career.
The first book of its kind, The Kaggle Book assembles in one place the techniques and skills you'll need for success in competitions, data science projects, and beyond. Two Kaggle Grandmasters walk you through modeling strategies you won't easily find elsewhere, and the knowledge they've accumulated along the way. As well as Kaggle-specific tips, you'll learn more general techniques for approaching tasks based on image, tabular, textual data, and reinforcement learning. You'll design better validation schemes and work more comfortably with different evaluation metrics.
Whether you want to climb the ranks of Kaggle, build some more data science skills, or improve the accuracy of your existing models, this book is for you.
Link: Book
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #featureengineering #kaggle #metrics #validation #hyperparameters #tabular #cv #nlp
@data_science_weekly
Millions of data enthusiasts from around the world compete on Kaggle, the most famous data science competition platform of them all. Participating in Kaggle competitions is a surefire way to improve your data analysis skills, network with an amazing community of data scientists, and gain valuable experience to help grow your career.
The first book of its kind, The Kaggle Book assembles in one place the techniques and skills you'll need for success in competitions, data science projects, and beyond. Two Kaggle Grandmasters walk you through modeling strategies you won't easily find elsewhere, and the knowledge they've accumulated along the way. As well as Kaggle-specific tips, you'll learn more general techniques for approaching tasks based on image, tabular, textual data, and reinforcement learning. You'll design better validation schemes and work more comfortably with different evaluation metrics.
Whether you want to climb the ranks of Kaggle, build some more data science skills, or improve the accuracy of your existing models, this book is for you.
Link: Book
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #featureengineering #kaggle #metrics #validation #hyperparameters #tabular #cv #nlp
@data_science_weekly
#tabular #anns
Из интересного советуют:
- активационные функции PReLU, eLU;
- размер батча 1% от датасета;
- сравнивать распределение прогнозов и таргета;
- batchnorm помогает классификации, особенно бинарной, и вредит регрессии;
https://www.youtube.com/watch?v=WPQOkoXhdBQ
Из интересного советуют:
- активационные функции PReLU, eLU;
- размер батча 1% от датасета;
- сравнивать распределение прогнозов и таргета;
- batchnorm помогает классификации, особенно бинарной, и вредит регрессии;
https://www.youtube.com/watch?v=WPQOkoXhdBQ
YouTube
Deep Learning for Tabular Data: A Bag of Tricks | ODSC 2020
Jason McGhee, Senior Machine Learning Engineer at DataRobot, has been spending time applying deep learning and neural networks to tabular data. Although the deep learning technique can prove challenging, his research supports how valuable it is when using…
👍1