Aspiring Data Science
373 subscribers
425 photos
11 videos
10 files
1.87K links
Заметки экономиста о программировании, прогнозировании и принятии решений, научном методе познания.
Контакт: @fingoldo

I call myself a data scientist because I know just enough math, economics & programming to be dangerous.
Download Telegram
#tabular

Отличный английский и крайне интересный взгляд на особенности табличных данных от заслуженного кэггл гроссмейстера.

Боян этот, кстати, на моей памяти первый лектор кто указывает, что бустинги плохи в моделировании линейных зависимостей - то, с чем я сам сталкивался недавно.

https://youtu.be/OcNBmilICgY?si=ozjCKNLNHFgiOqP6
#tabular #anns #trees

Любопытная попытка объяснить известный феномен.

"According to Grinsztajn et. al (2022)4, tree-based methods work well for tabular data because they are not rotational invariant. In tabular data, the feature columns are often individually meaningful, and mixing them with other columns by rotating them is a disadvantage. An MLP first has to learn the right rotation and therefore has a more difficult task.

Sparse solutions: rotationally invariant models have a hard time distinguishing relevant and irrelevant features. Trees and forests are good at separating relevant and irrelevant and offer sparser solutions.

https://mindfulmodeler.substack.com/p/inductive-biases-of-the-random-forest
The Kaggle Book by Konrad Banachewicz and Luca Massaron

Millions of data enthusiasts from around the world compete on Kaggle, the most famous data science competition platform of them all. Participating in Kaggle competitions is a surefire way to improve your data analysis skills, network with an amazing community of data scientists, and gain valuable experience to help grow your career.

The first book of its kind, The Kaggle Book assembles in one place the techniques and skills you'll need for success in competitions, data science projects, and beyond. Two Kaggle Grandmasters walk you through modeling strategies you won't easily find elsewhere, and the knowledge they've accumulated along the way. As well as Kaggle-specific tips, you'll learn more general techniques for approaching tasks based on image, tabular, textual data, and reinforcement learning. You'll design better validation schemes and work more comfortably with different evaluation metrics.

Whether you want to climb the ranks of Kaggle, build some more data science skills, or improve the accuracy of your existing models, this book is for you.

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #featureengineering #kaggle #metrics #validation #hyperparameters #tabular #cv #nlp

@data_science_weekly
#tabular #anns

Из интересного советуют:

- активационные функции PReLU, eLU;
- размер батча 1% от датасета;
- сравнивать распределение прогнозов и таргета;
- batchnorm помогает классификации, особенно бинарной, и вредит регрессии;

https://www.youtube.com/watch?v=WPQOkoXhdBQ
👍1