Artem Ryblov’s Data Science Weekly
282 subscribers
71 photos
95 links
@artemfisherman’s Data Science Weekly: Elevate your expertise with a standout data science resource each week, carefully chosen for depth and impact.

Long-form content: https://artemryblov.substack.com
Download Telegram
Annotated PyTorch Paper Implementations

This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations, and the website renders these as side-by-side formatted notes (see the screenshot).

These implementations will help you understand the algorithms better.

Link: https://nn.labml.ai/

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #pytorch #deeplearning #ai #dl #article #paper #ml #machinelearning #deeplearningalgorithms

@data_science_weekly
Feature Engineering and Selection: A Practical Approach for Predictive Models by Max Kuhn and Kjell Johnson

The process of developing predictive models includes many stages. Most resources focus on the modelling algorithms, but neglect other critical aspects of the modelling process. This book describes techniques for finding the best representations of predictors for modelling and for finding the best subset of predictors for improving model performance. A variety of example data sets are used to illustrate the techniques, along with R programs for reproducing the results.

Table of Contents:
1. Introduction
2. Illustrative Example: Predicting Risk of Ischemic Stroke
3. A Review of the Predictive Modeling Process
4. Exploratory Visualizations
5. Encoding Categorical Predictors
6. Engineering Numeric Predictors
7. Detecting Interaction Effects
8. Handling Missing Data
9. Working with Profile Data
10. Feature Selection Overview
11. Greedy Search Methods
12. Global Search Methods

Links:
- Direct Link

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #machinelearning #ml #featureengineering #featureselection #missingdata #categoricalvariables

@data_science_weekly
Machine Learning for Everyone. In simple words. With real-world examples. Yes, again.

Machine Learning is like sex in high school. Everyone is talking about it, a few know what to do, and only your teacher is doing it. If you ever tried to read articles about machine learning on the Internet, most likely you stumbled upon two types of them: thick academic trilogies filled with theorems (I couldn’t even get through half of one) or fishy fairytales about artificial intelligence, data-science magic, and jobs of the future.

A simple introduction for those who always wanted to understand machine learning. Only real-world problems, practical solutions, simple language, and no high-level theorems. One and for everyone. Whether you are a programmer or a manager.

Link: https://vas3k.com/blog/machine_learning/

Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #ml #machinelearning #data #features #algorithms #classification #regression #neuralnets #deeplearning #dl #supervised #unsupervised

@data_science_weekly
Understanding Deep Learning by Simon J.D. Prince

Deep learning is a fast-moving field with sweeping relevance in today’s increasingly digital world. Understanding Deep Learning provides an authoritative, accessible, and up-to-date treatment of the subject, covering all the key topics along with recent advances and cutting-edge concepts. Many deep learning texts are crowded with technical details that obscure fundamentals, but Simon Prince ruthlessly curates only the most important ideas to provide a high density of critical information in an intuitive and digestible form. From machine learning basics to advanced models, each concept is presented in lay terms and then detailed precisely in mathematical form and illustrated visually. The result is a lucid, self-contained textbook suitable for anyone with a basic background in applied mathematics.

- Up-to-date treatment of deep learning covers cutting-edge topics not found in existing texts, such as transformers and diffusion models
- Short, focused chapters progress in complexity, easing students into difficult concepts
- Pragmatic approach straddling theory and practice gives readers the level of detail required to implement naive versions of models
- Streamlined presentation separates critical ideas from background context and extraneous detail
- Minimal mathematical prerequisites, extensive illustrations, and practice problems make challenging material widely accessible
- Programming exercises offered in accompanying Python Notebooks

Link: https://udlbook.github.io/udlbook/

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #dl #deeplearning #transformers #diffusion

@data_science_weekly
The Illustrated Machine Learning

The idea is to make the complex world of Machine Learning more approachable through clear and concise illustrations.

The goal is to provide a visual aid for students, professionals, and anyone preparing for a technical interview to better understand the underlying concepts of Machine Learning.

Whether you're just starting out in the field or you're a seasoned professional looking to refresh your knowledge, these illustrations will be a valuable resource on your journey to understanding Machine Learning.

- Machine Learning
- Categorization
- Sampling and Resampling
- Bias/Variance
- Supervised Learning
- Unsupervised Learning
- Hyperparameters Tuning
- Machine Learning Engineering
- Introduction
- Before the Project Starts
- Data Collection and Preparation
- Projective Geometry
- Introduction
- Image Formation
- Structure from Motion
- Stereo Reconstruction
- Deep Learning Playbook

Link: https://illustrated-machine-learning.github.io/

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #machinelearning #ml #mlsystemdesign #machinelearningsystemdesign #geometry #visualization #illustrated #supervised #unsupervised #dl #deeplearning #bias #variance #biasvariance

@data_science_weekly
HarvardX: CS50's Introduction to Artificial Intelligence with Python

This course explores the concepts and algorithms at the foundation of modern artificial intelligence, diving into the ideas that give rise to technologies like game-playing engines, handwriting recognition, and machine translation. Through hands-on projects, students gain exposure to the theory behind graph search algorithms, classification, optimization, machine learning, large language models, and other topics in artificial intelligence as they incorporate them into their own Python programs. By course’s end, students emerge with experience in libraries for machine learning as well as knowledge of artificial intelligence principles that enable them to design intelligent systems of their own.

What you'll learn
- graph search algorithms
- adversarial search
- knowledge representation
- logical inference
- probability theory
- Bayesian networks
- Markov models
- constraint satisfaction
- machine learning
- reinforcement learning
- neural networks
- natural language processing

By the way, it starts today - December 14, 2023.

Links:
- https://www.edx.org/learn/artificial-intelligence/harvard-university-cs50-s-introduction-to-artificial-intelligence-with-python
- https://cs50.harvard.edu/ai/2024/

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #machinelearning #ml #deeplearning #dl #graphs #reinforcementlearning #rl #neuralnetworks #nn #naturallanguageprocessing #nlp

@data_science_weekly
Machine Learning Engineering Online Book by Stas Bekman

An open collection of methodologies to help with successful training of large language models and multi-modal models.

This is a technical material suitable for LLM/VLM training engineers and operators. That is the content here contains lots of scripts and copy-n-paste commands to enable you to quickly address your needs.

This repo is an ongoing brain dump of my experiences training Large Language Models (LLM) (and VLMs); a lot of the know-how Stas acquired while training the open-source BLOOM-176B model in 2022 and IDEFICS-80B multi-modal model in 2023. Currently, he is working on developing/training open-source Retrieval Augmented models at Contextual.AI.

Table of Contents
Part 1. Insights
- The AI Battlefield Engineering - What You Need To Know
Part 2. Key Hardware Components
- Accelerator - the work horses of ML - GPUs, TPUs, IPUs, FPGAs, HPUs, QPUs, RDUs (WIP)
- Network - intra-node and inter-node connectivity, calculating bandwidth requirements
- IO - local and distributed disks and filesystems
- CPU - cpus, affinities (WIP)
- CPU Memory - how much CPU memory is enough - the shortest chapter ever.
Part 3. Performance
- Fault Tolerance
- Performance
- Multi-Node networking
- Model parallelism
Part 4. Operating
- SLURM
- Training hyper-parameters and model initializations
- Instabilities
Part 5. Development
- Debugging software and hardware failures
- And more debugging
- Reproducibility
- Tensor precision / Data types
- HF Transformers notes - making small models, tokenizers, datasets, and other tips
Part 6. Miscellaneous
- Resources - LLM/VLM chronicles

Link: https://github.com/stas00/ml-engineering

Navigational hashtags: #armknowledgesharing #armbooks #armrepo
General hashtags: #llm #gpt #gpt3 #gpt4 #ml #engineering #mlsystemdesign #systemdesign #reproducibility #performance

@data_science_weekly
Google Machine Learning Education

Learn to build ML products with Google's Machine Learning Courses.

Foundational courses
The foundational courses cover machine learning fundamentals and core concepts. They recommend taking them in the order below.

1. Introduction to Machine Learning
A brief introduction to machine learning.
2. Machine Learning Crash Course
A hands-on course to explore the critical basics of machine learning.
3. Problem Framing
A course to help you map real-world problems to machine learning solutions.
4. Data Preparation and Feature Engineering
An introduction to preparing your data for ML workflows.
5. Testing and Debugging
Strategies for testing and debugging machine learning models and pipelines.

Advanced Courses
The advanced courses teach tools and techniques for solving a variety of machine learning problems. The courses are structured independently. Take them based on interest or problem domain.

- Decision Forests
Decision forests are an alternative to neural networks.
- Recommendation Systems
Recommendation systems generate personalized suggestions.
- Clustering
Clustering is a key unsupervised machine learning strategy to associate related items.
- Generative Adversarial Networks
GANs create new data instances that resemble your training data.
- Image Classification
Is that a picture of a cat or is it a dog?
- Fairness in Perspective API
Hands-on practice debugging fairness issues.

Guides
Their guides offer simple step-by-step walkthroughs for solving common machine learning problems using best practices.

- Rules of ML
Become a better machine learning engineer by following these machine learning best practices used at Google.
- People + AI Guidebook
This guide assists UXers, PMs, and developers in collaboratively working through AI design topics and questions.
- Text Classification
This comprehensive guide provides a walkthrough to solving text classification problems using machine learning.
- Good Data Analysis
This guide describes the tricks that an expert data analyst uses to evaluate huge data sets in machine learning problems.
- Deep Learning Tuning Playbook
This guide explains a scientific way to optimize the training of deep learning models.

Link: https://developers.google.com/machine-learning?hl=en

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #machinelearning #ml #google #course #courses #featureengineering #recsys #clustering #gan

@data_science_weekly
Supervised Machine Learning for Science. How to stop worrying and love your black box by Christoph Molnar & Timo Freiesleben

Machine learning has revolutionized science, from folding proteins and predicting tornadoes to studying human nature. While science has always had an intimate relationship with prediction, machine learning amplified this focus. But can this hyper-focus on prediction models be justified? Can a machine learning model be part of a scientific model? Or are we on the wrong track?

In this book, authors explore and justify supervised machine learning in science. However, a naive application of supervised learning won’t get you far because machine learning in raw form is unsuitable for science. After all, it lacks interpretability, uncertainty quantification, causality, and many more desirable attributes. Yet, we already have all the puzzle pieces needed to improve machine learning, from incorporating domain knowledge and ensuring the representativeness of the training data to creating robust, interpretable, and causal models. The problem is that the solutions are scattered everywhere.

In this book, authors bring together the philosophical justification and the solutions that make supervised machine learning a powerful tool for science.

The book consists of two parts:
- Part 1 discusses the relationship between science and machine learning.
- Part 2 addresses the shortcomings of supervised machine learning.

Link: https://ml-science-book.com/

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #machinelearning #ml #science #supervised

@data_science_weekly
Designing Machine Learning Systems by Chip Huyen

Machine learning systems are both complex and unique. Complex because they consist of many different components and involve many different stakeholders. Unique because they're data dependent, with data varying wildly from one use case to the next. In this book, you'll learn a holistic approach to designing ML systems that are reliable, scalable, maintainable, and adaptive to changing environments and business requirements.

Author Chip Huyen, co-founder of Claypot AI, considers each design decision--such as how to process and create training data, which features to use, how often to retrain models, and what to monitor--in the context of how it can help your system as a whole achieve its objectives. The iterative framework in this book uses actual case studies backed by ample references.

This book will help you tackle scenarios such as:
- Engineering data and choosing the right metrics to solve a business problem
- Automating the process for continually developing, evaluating, deploying, and updating models
- Developing a monitoring system to quickly detect and address issues your models might encounter in production
- Architecting an ML platform that serves across use cases
- Developing responsible ML systems

Link: https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #machinelearningsystemdesign #systemdesign #machinelearning #ml #designingmachinelearningsystems

@data_science_weekly
MLU-EXPLAIN
Visual explanations of core machine learning concepts

Machine Learning University (MLU) is an education initiative from Amazon designed to teach machine learning theory and practical application.

As part of that goal, MLU-Explain exists to teach important machine learning concepts through visual essays in a fun, informative, and accessible manner.

Available articles:
- Neural Networks
- Equality of Dots
- Logistic Regression
- Linear Regression
- Reinforcement Learning
- ROC & AUC
- Cross-validation
- Train, Test, and Validation Sets
- Precision & Recall
- Random Forest
- Decision Trees
- The Bias Variance Tradeoff
- Double Descent

Link:
- Direct Link

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #machinelearning #ml #visualisation

@data_science_weekly
Exceptional Resources for Data Science Interview Preparation. Part 2: Classic Machine Learning

In the previous article, I shared materials for preparing for one of the most daunting (for many) stages — Live Coding.

In this article, we will look at materials that can be used to prepare for the section on classic machine learning.

Table of contents
- Classic Machine Learning
- Resources
- Books
- Courses
- Sites
- Cheatsheets
- Other
- Let’s sum it up
- What’s next?

NB:
I'm the author of the article.
It was initially published in Russian (on habr.com), then I published it on medium.com. So, for Russian speakers I recommend to read Russian version, for English speakers I recommend to read English version and both will benefit from starring the repository, which will be maintained and updated when new resources become available.

Links:
- Medium (eng)
- Habr (rus)

Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #interview #interviewpreparation #machinelearning #ml

@data_science_weekly
Interpretable Machine Learning. A Guide for Making Black Box Models Explainable by Christoph Molnar

Machine learning has great potential for improving products, processes and research. But computers usually do not explain their predictions which is a barrier to the adoption of machine learning. This book is about making machine learning models and their decisions interpretable.

After exploring the concepts of interpretability, you will learn about simple, interpretable models such as decision trees, decision rules and linear regression. The focus of the book is on model-agnostic methods for interpreting black box models such as feature importance and accumulated local effects, and explaining individual predictions with Shapley values and LIME. In addition, the book presents methods specific to deep neural networks.

All interpretation methods are explained in depth and discussed critically. How do they work under the hood? What are their strengths and weaknesses? How can their outputs be interpreted? This book will enable you to select and correctly apply the interpretation method that is most suitable for your machine learning project. Reading the book is recommended for machine learning practitioners, data scientists, statisticians, and anyone else interested in making machine learning models interpretable.

Link:
- Direct Link

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #machinelearning #ml #interpretation #explanation #interpretability #blackbox

@data_science_weekly
Mathematics for Machine Learning by Marc Peter Deisenroth and A. Aldo Faisal

The fundamental mathematical tools needed to understand machine learning include linear algebra, analytic geometry, matrix decompositions, vector calculus, optimization, probability and statistics. These topics are traditionally taught in disparate courses, making it hard for data science or computer science students, or professionals, to efficiently learn the mathematics. This self contained textbook bridges the gap between mathematical and machine learning texts, introducing the mathematical concepts with a minimum of prerequisites. It uses these concepts to derive four central machine learning methods: linear regression, principal component analysis, Gaussian mixture models and support vector machines.

For students and others with a mathematical background, these derivations provide a starting point to machine learning texts. For those learning the mathematics for the first time, the methods help build intuition and practical experience with applying mathematical concepts.

Every chapter includes worked examples and exercises to test understanding. Programming tutorials are offered on the book's web site.

Table of Contents
Part I: Mathematical Foundations
1. Introduction and Motivation
2. Linear Algebra
3. Analytic Geometry
4. Matrix Decompositions
5. Vector Calculus
6. Probability and Distribution
7. Continuous Optimization
Part II: Central Machine Learning Problems
8. When Models Meet Data
9. Linear Regression
10. Dimensionality Reduction with Principal Component Analysis
11. Density Estimation with Gaussian Mixture Models
12. Classification with Support Vector Machines

Link: Direct Link

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #math #mathematics #maths #calculus #algebra #probability #geometry #optimization #machinelearning #ml

@data_science_weekly
MLOps Guide by Arthur Olga, Gabriel Monteiro, Guilherme Leite and Vinicius Lima

This site is intended to be a MLOps Guide to help projects and companies to build more reliable MLOps environment. This guide should contemplate the theory behind MLOps and an implementation that should fit for most use cases.

What is MLOps?
MLOps is a methodology of operation that aims to facilitate the process of bringing an experimental Machine Learning model into production and maintaining it efficiently. MLOps focus on bringing the methodology of DevOps used in the software industry to the Machine Learning model lifecycle.

In that way we can define some of the main features of a MLOPs project:
- Data and Model Versioning
- Feature Management and Storing
- Automation of Pipelines and Processes
- CI/CD for Machine Learning
- Continuous Monitoring of Models

What does this guide cover?
- Introduction to MLOps Concepts
- Tutorial for Building a MLOps Environment

Link: Direct

Navigational hashtags: #armknowledgesharing #armguides
General hashtags: #mlops #ml #operations

@data_science_weekly
Mathematics Of Machine Learning by MIT

Broadly speaking, Machine Learning refers to the automated identification of patterns in data. As such it has been a fertile ground for new statistical and algorithmic developments. The purpose of this course is to provide a mathematically rigorous introduction to these developments with emphasis on methods and their analysis.

Link: Direct

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #math #maths #mathematics #ml

@data_science_weekly
Exceptional Resources for Data Science Interview Preparation. Part 3: Specialized Machine Learning

In the previous article, I shared materials for preparing for the stage on Classical Machine Learning.

In this article, we will look at materials that can be used to prepare for the section on specialized machine learning.

Table of contents
- Resources
- Deep Learning
- Natural Language Processing
- Computer Vision
- Graph Neural Networks
- Reinforcement Learning
- Recommender Systems
- Time Series
- Big Data
- Let’s sum it up
- What’s next?


NB:
I'm the author of the article.
It was initially published in Russian (on habr.com), then I published it on medium.com. So, for Russian speakers I recommend to read Russian version, for English speakers I recommend to read English version and both will benefit from starring the repository, which will be maintained and updated when new resources become available.

Links:
- Medium (eng)
- Habr (rus)

Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #interview #interviewpreparation #machinelearning #ml #deeplearning #dl #nlp #cv #rl #gnn #recsys

@data_science_weekly
Leetcode for ML

Super neat set of machine learning coding challenges.

It could be useful to prep for an exam or ML interview.

Link

Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #ml #dl #machinelearning #deeplearning
Applied Causal Inference Powered by ML and AI by Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin Spindler, Vasilis Syrgkanis

An introduction to the emerging fusion of machine learning and causal inference.

The book introduces ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and presents Debiased Machine Learning methods to do inference in such models using modern predictive tools.

Links:
- PDF
- Site
- GitHub

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #statistics #ml #ai #causal #causalinference
Introduction to Machine Learning (I2ML) by LMU Munich

This website offers an open and free introductory course on (supervised) machine learning. The course is constructed as self-contained as possible, and enables self-study through lecture videos, PDF slides, cheatsheets, quizzes, exercises (with solutions), and notebooks.

The quite extensive material can roughly be divided into:
- An introductory undergraduate part (chapters 1-10)
- A more advanced second one on MSc level (chapters 11-19)
- A third course, on MSc level (chapters 20-23).

A key goal of the course is to teach the fundamental building blocks behind ML, instead of introducing “yet another algorithm with yet another name”. We discuss, compare, and contrast risk minimization, statistical parameter estimation, the Bayesian viewpoint, and information theory and demonstrate that all of these are equally valid entry points to ML. Developing the ability to take on and switch between these perspectives is a major goal of this course, and in our opinion not always ideally presented in other courses.

Link:
- Main Course Website

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #ml #machinelearning #supervised