Aspiring Data Science

#tabular

Отличный английский и крайне интересный взгляд на особенности табличных данных от заслуженного кэггл гроссмейстера.

Боян этот, кстати, на моей памяти первый лектор кто указывает, что бустинги плохи в моделировании линейных зависимостей - то, с чем я сам сталкивался недавно.

https://youtu.be/OcNBmilICgY?si=ozjCKNLNHFgiOqP6

YouTube

The Past, Present and the Future of Machine Learning for Tabular Data - Bojan Tunguz

In this talk, you will learn about the history and future directions of machine learning for Tabular Data. The talk covers
• What is Tabular Data?
• The Main Issues with Tabular Data
• Deep Learning vs. Gradient Boosted Trees
• Research Directions
• GPU…

171 viewsAnatoly Alekseev, edited 10:30

Aspiring Data Science

#tabular #anns #trees

Любопытная попытка объяснить известный феномен.

"According to Grinsztajn et. al (2022)4, tree-based methods work well for tabular data because they are not rotational invariant. In tabular data, the feature columns are often individually meaningful, and mixing them with other columns by rotating them is a disadvantage. An MLP first has to learn the right rotation and therefore has a more difficult task.

Sparse solutions: rotationally invariant models have a hard time distinguishing relevant and irrelevant features. Trees and forests are good at separating relevant and irrelevant and offer sparser solutions.

https://mindfulmodeler.substack.com/p/inductive-biases-of-the-random-forest

Mindful Modeler

Inductive biases of the Random Forest and their consequences

part 4 of the inductive bias series

114 viewsAnatoly Alekseev, edited 23:37

Aspiring Data Science

Forwarded from Artem Ryblov’s Data Science Weekly

The Kaggle Book by Konrad Banachewicz and Luca Massaron

Millions of data enthusiasts from around the world compete on Kaggle, the most famous data science competition platform of them all. Participating in Kaggle competitions is a surefire way to improve your data analysis skills, network with an amazing community of data scientists, and gain valuable experience to help grow your career.

The first book of its kind, The Kaggle Book assembles in one place the techniques and skills you'll need for success in competitions, data science projects, and beyond. Two Kaggle Grandmasters walk you through modeling strategies you won't easily find elsewhere, and the knowledge they've accumulated along the way. As well as Kaggle-specific tips, you'll learn more general techniques for approaching tasks based on image, tabular, textual data, and reinforcement learning. You'll design better validation schemes and work more comfortably with different evaluation metrics.

Whether you want to climb the ranks of Kaggle, build some more data science skills, or improve the accuracy of your existing models, this book is for you.

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #featureengineering #kaggle #metrics #validation #hyperparameters #tabular #cv #nlp

@data_science_weekly

98 viewsAnatoly Alekseev, 14:36

Aspiring Data Science

#automl #tabpfn

А вот и Фрэнк на подкасте.

https://www.youtube.com/watch?v=BAYsT0wxP90

YouTube

863: TabPFN: Deep Learning for Tabular Data (That Actually Works!) — with Prof. Frank Hutter

#TabPFN #DeepLearning #Tabular

@JonKrohnLearns talks tabular data with Frank Hutter, Professor of Artificial Intelligence at Universität Freiburg in Germany. Despite the great steps that deep learning has made in analysing images, audio, and natural language…

109 viewsAnatoly Alekseev, 02:35

Aspiring Data Science

#tabular #anns

Из интересного советуют:

- активационные функции PReLU, eLU;
- размер батча 1% от датасета;
- сравнивать распределение прогнозов и таргета;
- batchnorm помогает классификации, особенно бинарной, и вредит регрессии;

https://www.youtube.com/watch?v=WPQOkoXhdBQ

YouTube

Deep Learning for Tabular Data: A Bag of Tricks | ODSC 2020

Jason McGhee, Senior Machine Learning Engineer at DataRobot, has been spending time applying deep learning and neural networks to tabular data. Although the deep learning technique can prove challenging, his research supports how valuable it is when using…

👍1

92 viewsAnatoly Alekseev, edited 14:03

About

Blog

Apps

Platform