Aspiring Data Science

Forwarded from Artem Ryblov’s Data Science Weekly (Artem Ryblov)

Python & ML tasks
Задачи по Python и машинному обучению

Today I want to share with you a telegram channel which will help you retain your knowledge of python and maybe learn something new.

Every day a question is posted and you can answer it using the quiz under the question.
If your answer is wrong, you can find out the correct one and read the explanation.

#armknowledgesharing #armtelegram #python

55 views19:01

Forwarded from Artem Ryblov’s Data Science Weekly (Artem Ryblov)

Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning by Sebastian Raschka

The correct use of model evaluation, model selection, and algorithm selection techniques is vital in academic machine learning research as well as in many industrial settings.
This article reviews different techniques that can be used for each of these three subtasks and discusses the main advantages and disadvantages of each technique with references to theoretical and empirical studies. Further, recommendations are given to encourage best yet feasible practices in research and applications of machine learning.

Link
https://arxiv.org/abs/1811.12808

Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #machinelearning #ml #modelevaluation #evaluation #selection #cv #crossvalidation

@accelerated_learning

83 viewsAnatoly Alekseev, 14:18

Aspiring Data Science

Forwarded from Artem Ryblov’s Data Science Weekly (Artem Ryblov)

Mindful Modeler by Christoph Molnar

The newsletter combines the best of two worlds: the performance mindset of machine learning and the mindfulness of statistical thinking.

Machine learning has become mainstream while falling short in the silliest ways: lack of interpretability, biased and missing data, wrong conclusions, … To statisticians, these shortcomings are often unsurprising. Statisticians are relentless in their quest to understand how the data came about. They make sure that their models reflect the data-generating process and interpret models accordingly.
In a sea of people who basically know how to model.fit() and model.predict() you can stand out by bringing statistical thinking to the arena.
Sign up for this newsletter to combine performance-driven machine learning with statistical thinking. Become a mindful modeller.

You'll learn about:
- Thinking like a statistician while performing like a machine learner
- Spotting non-obvious data problems
- Interpretable machine learning
- Other modelling mindsets such as causal inference and prompt engineering

Link
https://mindfulmodeler.substack.com/

Navigational hashtags: #armknowledgesharing #armnewsletters
General hashtags: #modelling #modeling #ml #machinelearning #statistics #modelinterpretation #data #interpretability #casualinference

@accelerated_learning

Substack

Mindful Modeler | Christoph Molnar | Substack

Better machine learning by thinking like a statistician. About model interpretation, paying attention to data, and always staying critical. Click to read Mindful Modeler, by Christoph Molnar, a Substack publication with tens of thousands of subscribers.

👍1

84 viewsAnatoly Alekseev, 12:59

Aspiring Data Science

Forwarded from Artem Ryblov’s Data Science Weekly (Artem Ryblov)

Feature Engineering and Selection: A Practical Approach for Predictive Models by Max Kuhn and Kjell Johnson

The process of developing predictive models includes many stages. Most resources focus on the modelling algorithms, but neglect other critical aspects of the modelling process. This book describes techniques for finding the best representations of predictors for modelling and for finding the best subset of predictors for improving model performance. A variety of example data sets are used to illustrate the techniques, along with R programs for reproducing the results.

Table of Contents:
1. Introduction
2. Illustrative Example: Predicting Risk of Ischemic Stroke
3. A Review of the Predictive Modeling Process
4. Exploratory Visualizations
5. Encoding Categorical Predictors
6. Engineering Numeric Predictors
7. Detecting Interaction Effects
8. Handling Missing Data
9. Working with Profile Data
10. Feature Selection Overview
11. Greedy Search Methods
12. Global Search Methods

Links:
- http://www.feat.engineering/
- https://www.routledge.com/Feature-Engineering-and-Selection-A-Practical-Approach-for-Predictive-Models/Kuhn-Johnson/p/book/9781138079229
- https://www.routledge.com/Feature-Engineering-and-Selection-A-Practical-Approach-for-Predictive-Models/Kuhn-Johnson/p/book/9781138079229

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #machinelearning #ml #featureengineering #featureselection #missingdata #categoricalvariables

@accelerated_learning

👍1

98 viewsAnatoly Alekseev, 21:06

Aspiring Data Science

Forwarded from Artem Ryblov’s Data Science Weekly (Artem Ryblov)

Thinking Clearly with Data: A Guide to Quantitative Reasoning and Analysis by Ethan Bueno de Mesquita, Anthony Fowler

An introduction to data science or statistics shouldn’t involve proving complex theorems or memorizing obscure terms and formulas, but that is exactly what most introductory quantitative textbooks emphasize. In contrast, Thinking Clearly with Data focuses, first and foremost, on critical thinking and conceptual understanding in order to teach students how to be better consumers and analysts of the kinds of quantitative information and arguments that they will encounter throughout their lives.

Among much else, the book teaches how to assess whether an observed relationship in data reflects a genuine relationship in the world and, if so, whether it is causal; how to make the most informative comparisons for answering questions; what questions to ask others who are making arguments using quantitative evidence; which statistics are particularly informative or misleading; how quantitative evidence should and shouldn’t influence decision-making; and how to make better decisions by using moral values as well as data.

- An ideal textbook for introductory quantitative methods courses in data science, statistics, political science, economics, psychology, sociology, public policy, and other fields
- Introduces the basic toolkit of data analysis―including sampling, hypothesis testing, Bayesian inference, regression, experiments, instrumental variables, differences in differences, and regression discontinuity
- Uses real-world examples and data from a wide variety of subjects
- Includes practice questions and data exercises

Link: https://www.amazon.com/Thinking-Clearly-Data-Quantitative-Reasoning/dp/0691214352

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #datascience #correlation #regression #causation #randomizedexperiments #statistics

@data_science_links

114 viewsAnatoly Alekseev, 00:25

Aspiring Data Science

Forwarded from Artem Ryblov’s Data Science Weekly

Google for Developers

Machine Learning | Google for Developers

Educational resources for machine learning.

Google Machine Learning Education

Learn to build ML products with Google's Machine Learning Courses.

Foundational courses
The foundational courses cover machine learning fundamentals and core concepts. They recommend taking them in the order below.

1. Introduction to Machine Learning
A brief introduction to machine learning.
2. Machine Learning Crash Course
A hands-on course to explore the critical basics of machine learning.
3. Problem Framing
A course to help you map real-world problems to machine learning solutions.
4. Data Preparation and Feature Engineering
An introduction to preparing your data for ML workflows.
5. Testing and Debugging
Strategies for testing and debugging machine learning models and pipelines.

Advanced Courses
The advanced courses teach tools and techniques for solving a variety of machine learning problems. The courses are structured independently. Take them based on interest or problem domain.

- Decision Forests
Decision forests are an alternative to neural networks.
- Recommendation Systems
Recommendation systems generate personalized suggestions.
- Clustering
Clustering is a key unsupervised machine learning strategy to associate related items.
- Generative Adversarial Networks
GANs create new data instances that resemble your training data.
- Image Classification
Is that a picture of a cat or is it a dog?
- Fairness in Perspective API
Hands-on practice debugging fairness issues.

Guides
Their guides offer simple step-by-step walkthroughs for solving common machine learning problems using best practices.

- Rules of ML
Become a better machine learning engineer by following these machine learning best practices used at Google.
- People + AI Guidebook
This guide assists UXers, PMs, and developers in collaboratively working through AI design topics and questions.
- Text Classification
This comprehensive guide provides a walkthrough to solving text classification problems using machine learning.
- Good Data Analysis
This guide describes the tricks that an expert data analyst uses to evaluate huge data sets in machine learning problems.
- Deep Learning Tuning Playbook
This guide explains a scientific way to optimize the training of deep learning models.

Link: https://developers.google.com/machine-learning?hl=en

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #machinelearning #ml #google #course #courses #featureengineering #recsys #clustering #gan

@data_science_weekly

108 viewsAnatoly Alekseev, 08:06

Aspiring Data Science

Forwarded from Artem Ryblov’s Data Science Weekly

Designing Machine Learning Systems by Chip Huyen

Machine learning systems are both complex and unique. Complex because they consist of many different components and involve many different stakeholders. Unique because they're data dependent, with data varying wildly from one use case to the next. In this book, you'll learn a holistic approach to designing ML systems that are reliable, scalable, maintainable, and adaptive to changing environments and business requirements.

Author Chip Huyen, co-founder of Claypot AI, considers each design decision--such as how to process and create training data, which features to use, how often to retrain models, and what to monitor--in the context of how it can help your system as a whole achieve its objectives. The iterative framework in this book uses actual case studies backed by ample references.

This book will help you tackle scenarios such as:
- Engineering data and choosing the right metrics to solve a business problem
- Automating the process for continually developing, evaluating, deploying, and updating models
- Developing a monitoring system to quickly detect and address issues your models might encounter in production
- Architecting an ML platform that serves across use cases
- Developing responsible ML systems

Link: https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #machinelearningsystemdesign #systemdesign #machinelearning #ml #designingmachinelearningsystems

@data_science_weekly

136 viewsAnatoly Alekseev, 08:15

Aspiring Data Science

Forwarded from Artem Ryblov’s Data Science Weekly

MLOps Guide by Arthur Olga, Gabriel Monteiro, Guilherme Leite and Vinicius Lima

This site is intended to be a MLOps Guide to help projects and companies to build more reliable MLOps environment. This guide should contemplate the theory behind MLOps and an implementation that should fit for most use cases.

What is MLOps?
MLOps is a methodology of operation that aims to facilitate the process of bringing an experimental Machine Learning model into production and maintaining it efficiently. MLOps focus on bringing the methodology of DevOps used in the software industry to the Machine Learning model lifecycle.

In that way we can define some of the main features of a MLOPs project:
- Data and Model Versioning
- Feature Management and Storing
- Automation of Pipelines and Processes
- CI/CD for Machine Learning
- Continuous Monitoring of Models

What does this guide cover?
- Introduction to MLOps Concepts
- Tutorial for Building a MLOps Environment

Link: Direct

Navigational hashtags: #armknowledgesharing #armguides
General hashtags: #mlops #ml #operations

@data_science_weekly

88 viewsAnatoly Alekseev, 14:11

Aspiring Data Science

Forwarded from Artem Ryblov’s Data Science Weekly

👍2

76 viewsAnatoly Alekseev, 20:59

Aspiring Data Science

Forwarded from Artem Ryblov’s Data Science Weekly

Practitioners guide to MLOps: A framework for continuous delivery and automation of machine learning by Google Cloud

Across industries, DevOps and DataOps have been widely adopted as methodologies to improve quality and reduce the time to market of software engineering and data engineering initiatives. With the rapid growth in machine learning (ML) systems, similar approaches need to be developed in the context of ML engineering, which handle the unique complexities of the practical applications of ML. This is the domain of MLOps. MLOps is a set of standardized processes and technology capabilities for building, deploying, and operationalizing ML systems rapidly and reliably.

The document is in two parts. The first part, an overview of the MLOps lifecycle, is for all readers. It introduces MLOps processes and capabilities and why they’re important for successful adoption of ML-based systems.

The second part is a deep dive on the MLOps processes and capabilities. This part is for readers who want to understand the concrete details of tasks like running a continuous training pipeline, deploying a model, and monitoring predictive performance of an ML model.

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #mlops

101 viewsAnatoly Alekseev, 11:46

Aspiring Data Science

Forwarded from Artem Ryblov’s Data Science Weekly

How to Win a Kaggle Competition by Darek Kłeczek

Darek Kłeczek:

When I join a competition, I research winning solutions from past similar competitions. It takes a lot of time to read and digest them, but it's an incredible source of ideas and knowledge. But what if we could learn from all the competitions? We've been given a list of Kaggle writeups in this competition, but there are so many of them! If only we could find a way to extract some structured data and analyze it... Well, it turns out that large language models (LLMs) [1] can help us extract structured data from unstructured writeups.

In this essay, author starts by providing a quick overview of the process he uses to collect data. He then presents several insights from analyzing datasets. The focus is to understand what the community has learned over the past 2 years of working and experimenting with Kaggle competitions. Finally, he mentions some ideas for future research.

Link: Kaggle

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #kaggle #competitions

106 viewsAnatoly Alekseev, 07:02

Aspiring Data Science

Forwarded from Artem Ryblov’s Data Science Weekly

The Kaggle Book by Konrad Banachewicz and Luca Massaron

Millions of data enthusiasts from around the world compete on Kaggle, the most famous data science competition platform of them all. Participating in Kaggle competitions is a surefire way to improve your data analysis skills, network with an amazing community of data scientists, and gain valuable experience to help grow your career.

The first book of its kind, The Kaggle Book assembles in one place the techniques and skills you'll need for success in competitions, data science projects, and beyond. Two Kaggle Grandmasters walk you through modeling strategies you won't easily find elsewhere, and the knowledge they've accumulated along the way. As well as Kaggle-specific tips, you'll learn more general techniques for approaching tasks based on image, tabular, textual data, and reinforcement learning. You'll design better validation schemes and work more comfortably with different evaluation metrics.

Whether you want to climb the ranks of Kaggle, build some more data science skills, or improve the accuracy of your existing models, this book is for you.

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #featureengineering #kaggle #metrics #validation #hyperparameters #tabular #cv #nlp

@data_science_weekly

98 viewsAnatoly Alekseev, 14:36

About

Blog

Apps

Platform