#kaggle #ml #competitions
Интересное преобразование таргета, корень 4-й степени, не слышал раньше про такое. Также fair loss в xgboost, усреднение весов нейросетей (по достижении равновесия), обучение SVR и kNN на подвыбрках. Исправление предсказаний xgboost-а это просто жесть, какой фигнёй только эти кэгглеры не занимаются )
https://www.youtube.com/watch?v=p7ArDjMImiI
Интересное преобразование таргета, корень 4-й степени, не слышал раньше про такое. Также fair loss в xgboost, усреднение весов нейросетей (по достижении равновесия), обучение SVR и kNN на подвыбрках. Исправление предсказаний xgboost-а это просто жесть, какой фигнёй только эти кэгглеры не занимаются )
https://www.youtube.com/watch?v=p7ArDjMImiI
YouTube
Kaggle Allstate Claims Severity: предсказание тяжести страхового случая — Алексей Носков
Алексей Носков рассказывает про задачу определения тяжести страхового случая Kaggle Allstate Claims Severity, где занял 2 место. Из видео вы сможете узнать:
- Нужно ли преобразовывать целевую переменную для обучения моделей
- Как можно строить новые признаки…
- Нужно ли преобразовывать целевую переменную для обучения моделей
- Как можно строить новые признаки…
#kaggle #tricks #ml #titericz #featureengineering
Before FE, calculate corr coeff of raw features & the target; наверное, лучше всё-таки брать половину сета, чтобы не оверфитить совсем уж. С оценкой корреляций (нелинейных) и "интеракций", кстати, очень может помочь Диоген.
Combine numerical features: log(A)*log(B), A*exp(B), Rank(A)+Rank(B), sin(A)+cos(B) etc;
Use binary flag for NAs;
Do N-way nested OOF Target Encoding;
Try aggregations of one feature by another;
Try extensive target transformations (TT), as y^1/2, y^1/4,log(10+y), 10/y etc;
Try several clustering algos to create new categorical or numerical features based on cluster IDs or distances;
Trees leaves indices as weak features to the linear models (incl. factorization machines);
LOFO feature selection;
Adversarial Validation to tell train apart from test;
https://www.youtube.com/watch?v=RtqtM1UJfZc
Before FE, calculate corr coeff of raw features & the target; наверное, лучше всё-таки брать половину сета, чтобы не оверфитить совсем уж. С оценкой корреляций (нелинейных) и "интеракций", кстати, очень может помочь Диоген.
Combine numerical features: log(A)*log(B), A*exp(B), Rank(A)+Rank(B), sin(A)+cos(B) etc;
Use binary flag for NAs;
Do N-way nested OOF Target Encoding;
Try aggregations of one feature by another;
Try extensive target transformations (TT), as y^1/2, y^1/4,log(10+y), 10/y etc;
Try several clustering algos to create new categorical or numerical features based on cluster IDs or distances;
Trees leaves indices as weak features to the linear models (incl. factorization machines);
LOFO feature selection;
Adversarial Validation to tell train apart from test;
https://www.youtube.com/watch?v=RtqtM1UJfZc
YouTube
Kaggle Tips for Feature Engineering and Selection | by Gilberto Titericz | Kaggle Days Meetup Madrid
Gilberto Titericz, Kaggle GrandMaster and top-1 in Kaggle Competitions Ranking for years, talks about two important topics in Machine Learning: Feature Engineering and Feature Selection
25 November 2019, Madrid - Part II
25 November 2019, Madrid - Part II
🔥1
Forwarded from Artem Ryblov’s Data Science Weekly
How to Win a Kaggle Competition by Darek Kłeczek
Darek Kłeczek:
In this essay, author starts by providing a quick overview of the process he uses to collect data. He then presents several insights from analyzing datasets. The focus is to understand what the community has learned over the past 2 years of working and experimenting with Kaggle competitions. Finally, he mentions some ideas for future research.
Link: Kaggle
Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #kaggle #competitions
Darek Kłeczek:
When I join a competition, I research winning solutions from past similar competitions. It takes a lot of time to read and digest them, but it's an incredible source of ideas and knowledge. But what if we could learn from all the competitions? We've been given a list of Kaggle writeups in this competition, but there are so many of them! If only we could find a way to extract some structured data and analyze it... Well, it turns out that large language models (LLMs) [1] can help us extract structured data from unstructured writeups.
In this essay, author starts by providing a quick overview of the process he uses to collect data. He then presents several insights from analyzing datasets. The focus is to understand what the community has learned over the past 2 years of working and experimenting with Kaggle competitions. Finally, he mentions some ideas for future research.
Link: Kaggle
Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #kaggle #competitions
Forwarded from Artem Ryblov’s Data Science Weekly
The Kaggle Book by Konrad Banachewicz and Luca Massaron
Millions of data enthusiasts from around the world compete on Kaggle, the most famous data science competition platform of them all. Participating in Kaggle competitions is a surefire way to improve your data analysis skills, network with an amazing community of data scientists, and gain valuable experience to help grow your career.
The first book of its kind, The Kaggle Book assembles in one place the techniques and skills you'll need for success in competitions, data science projects, and beyond. Two Kaggle Grandmasters walk you through modeling strategies you won't easily find elsewhere, and the knowledge they've accumulated along the way. As well as Kaggle-specific tips, you'll learn more general techniques for approaching tasks based on image, tabular, textual data, and reinforcement learning. You'll design better validation schemes and work more comfortably with different evaluation metrics.
Whether you want to climb the ranks of Kaggle, build some more data science skills, or improve the accuracy of your existing models, this book is for you.
Link: Book
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #featureengineering #kaggle #metrics #validation #hyperparameters #tabular #cv #nlp
@data_science_weekly
Millions of data enthusiasts from around the world compete on Kaggle, the most famous data science competition platform of them all. Participating in Kaggle competitions is a surefire way to improve your data analysis skills, network with an amazing community of data scientists, and gain valuable experience to help grow your career.
The first book of its kind, The Kaggle Book assembles in one place the techniques and skills you'll need for success in competitions, data science projects, and beyond. Two Kaggle Grandmasters walk you through modeling strategies you won't easily find elsewhere, and the knowledge they've accumulated along the way. As well as Kaggle-specific tips, you'll learn more general techniques for approaching tasks based on image, tabular, textual data, and reinforcement learning. You'll design better validation schemes and work more comfortably with different evaluation metrics.
Whether you want to climb the ranks of Kaggle, build some more data science skills, or improve the accuracy of your existing models, this book is for you.
Link: Book
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #featureengineering #kaggle #metrics #validation #hyperparameters #tabular #cv #nlp
@data_science_weekly