Artem Ryblov’s Data Science Weekly
226 subscribers
61 photos
86 links
@artemfisherman’s Data Science Weekly: Elevate your expertise with a standout data science resource each week, carefully chosen for depth and impact.
Long-form content: https://artemryblov.substack.com
Download Telegram
MLU-EXPLAIN
Visual explanations of core machine learning concepts

Machine Learning University (MLU) is an education initiative from Amazon designed to teach machine learning theory and practical application.

As part of that goal, MLU-Explain exists to teach important machine learning concepts through visual essays in a fun, informative, and accessible manner.

Available articles:
- Neural Networks
- Equality of Dots
- Logistic Regression
- Linear Regression
- Reinforcement Learning
- ROC & AUC
- Cross-validation
- Train, Test, and Validation Sets
- Precision & Recall
- Random Forest
- Decision Trees
- The Bias Variance Tradeoff
- Double Descent

Link:
- Direct Link

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #machinelearning #ml #visualisation

@data_science_weekly
Exceptional Resources for Data Science Interview Preparation. Part 2: Classic Machine Learning

In the previous article, I shared materials for preparing for one of the most daunting (for many) stages — Live Coding.

In this article, we will look at materials that can be used to prepare for the section on classic machine learning.

Table of contents
- Classic Machine Learning
- Resources
- Books
- Courses
- Sites
- Cheatsheets
- Other
- Let’s sum it up
- What’s next?

NB:
I'm the author of the article.
It was initially published in Russian (on habr.com), then I published it on medium.com. So, for Russian speakers I recommend to read Russian version, for English speakers I recommend to read English version and both will benefit from starring the repository, which will be maintained and updated when new resources become available.

Links:
- Medium (eng)
- Habr (rus)

Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #interview #interviewpreparation #machinelearning #ml

@data_science_weekly
Interpretable Machine Learning. A Guide for Making Black Box Models Explainable by Christoph Molnar

Machine learning has great potential for improving products, processes and research. But computers usually do not explain their predictions which is a barrier to the adoption of machine learning. This book is about making machine learning models and their decisions interpretable.

After exploring the concepts of interpretability, you will learn about simple, interpretable models such as decision trees, decision rules and linear regression. The focus of the book is on model-agnostic methods for interpreting black box models such as feature importance and accumulated local effects, and explaining individual predictions with Shapley values and LIME. In addition, the book presents methods specific to deep neural networks.

All interpretation methods are explained in depth and discussed critically. How do they work under the hood? What are their strengths and weaknesses? How can their outputs be interpreted? This book will enable you to select and correctly apply the interpretation method that is most suitable for your machine learning project. Reading the book is recommended for machine learning practitioners, data scientists, statisticians, and anyone else interested in making machine learning models interpretable.

Link:
- Direct Link

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #machinelearning #ml #interpretation #explanation #interpretability #blackbox

@data_science_weekly
Mathematics for Machine Learning by Marc Peter Deisenroth and A. Aldo Faisal

The fundamental mathematical tools needed to understand machine learning include linear algebra, analytic geometry, matrix decompositions, vector calculus, optimization, probability and statistics. These topics are traditionally taught in disparate courses, making it hard for data science or computer science students, or professionals, to efficiently learn the mathematics. This self contained textbook bridges the gap between mathematical and machine learning texts, introducing the mathematical concepts with a minimum of prerequisites. It uses these concepts to derive four central machine learning methods: linear regression, principal component analysis, Gaussian mixture models and support vector machines.

For students and others with a mathematical background, these derivations provide a starting point to machine learning texts. For those learning the mathematics for the first time, the methods help build intuition and practical experience with applying mathematical concepts.

Every chapter includes worked examples and exercises to test understanding. Programming tutorials are offered on the book's web site.

Table of Contents
Part I: Mathematical Foundations
1. Introduction and Motivation
2. Linear Algebra
3. Analytic Geometry
4. Matrix Decompositions
5. Vector Calculus
6. Probability and Distribution
7. Continuous Optimization
Part II: Central Machine Learning Problems
8. When Models Meet Data
9. Linear Regression
10. Dimensionality Reduction with Principal Component Analysis
11. Density Estimation with Gaussian Mixture Models
12. Classification with Support Vector Machines

Link: Direct Link

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #math #mathematics #maths #calculus #algebra #probability #geometry #optimization #machinelearning #ml

@data_science_weekly
The Pragmatic Engineer

The #1 technology newsletter on Substack. Highly relevant for software engineers and engineering managers, useful for those working in tech. Written by engineering manager and software engineer Gergely Orosz who was previously at Uber, Skype/Microsoft, and at startups.

What to expect:
- Big Tech and startups, from the inside. Tech is accelerating rapidly: but some fast-moving companies are ahead of the rest of the pack. What are they doing differently and why? He talks with people working at these companies to get insights and details.
- Actionable advice for engineering managers, software engineers and tech workers. Topics covered are relevant to those working at tech companies. Get tools and insights to become a more efficient engineering leader. If you use just one approach to make your project, team, or company more efficient, the weekly newsletter already pays for itself.
- A pulse on the tech market and trends worth knowing about. What is happening in tech, and why? How is the market changing? What does this mean for hiring managers and for those navigating their careers? He covers patterns and trends heard within Big Tech and high-growth startups in the series The Pulse.

Link: Direct Link

Navigational hashtags: #armknowledgesharing #armnewsletters
General hashtags: #technology #engineering #efficiency

@data_science_weekly
MLOps Guide by Arthur Olga, Gabriel Monteiro, Guilherme Leite and Vinicius Lima

This site is intended to be a MLOps Guide to help projects and companies to build more reliable MLOps environment. This guide should contemplate the theory behind MLOps and an implementation that should fit for most use cases.

What is MLOps?
MLOps is a methodology of operation that aims to facilitate the process of bringing an experimental Machine Learning model into production and maintaining it efficiently. MLOps focus on bringing the methodology of DevOps used in the software industry to the Machine Learning model lifecycle.

In that way we can define some of the main features of a MLOPs project:
- Data and Model Versioning
- Feature Management and Storing
- Automation of Pipelines and Processes
- CI/CD for Machine Learning
- Continuous Monitoring of Models

What does this guide cover?
- Introduction to MLOps Concepts
- Tutorial for Building a MLOps Environment

Link: Direct

Navigational hashtags: #armknowledgesharing #armguides
General hashtags: #mlops #ml #operations

@data_science_weekly
Lessons in Statistical Thinking by Daniel Kaplan

One of the oft-stated goals of education is the development of “critical thinking” skills. Although it is rare to see a careful definition of critical thinking, widely accepted elements include framing and recognizing coherent arguments, the application of logic patterns such as deduction, the skeptical evaluation of evidence, consideration of alternative explanations, and a disinclination to accept unsubstantiated claims.

“Statistical thinking” is a variety of critical thinking involving data and inductive reasoning directed to draw reasonable and useful conclusions that can guide decision-making and action.

Surprisingly, many university statistics courses are not primarily about statistical reasoning. They do cover some technical methods used in statistical reasoning, but they have replaced notions of “useful,” “decision-making,” and “action” with doctrines such as “null hypothesis significance testing” and “correlation is not causation.” For example, a core method for drawing responsible conclusions about causal relationships by adjusting for “covariates” is hardly ever even mentioned in conventional statistics courses.

These Lessons in Statistical Thinking present the statistical ideas and methods behind decision-making to guide action.

Link: Direct Link

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #stats #statistics #math #maths

@data_science_weekly
Subscribe to my Substack!

All the new articles will be there and the first one is already available!
Introduction To Algorithms by MIT

This is an introductory course covering elementary data structures (dynamic arrays, heaps, balanced binary search trees, hash tables) and algorithmic approaches to solve classical problems (sorting, graph searching, dynamic programming). Introduction to mathematical modeling of computational problems, as well as common algorithms, algorithmic paradigms, and data structures used to solve these problems. Emphasizes the relationship between algorithms and programming, and introduces basic performance measures and analysis techniques for these problems.

Link: Direct Link

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #algorithms #datastructures #mit

@data_science_weekly
What the f*ck Python! 😱

Python, being a beautifully designed high-level and interpreter-based programming language, provides us with many features for the programmer's comfort. But sometimes, the outcomes of a Python snippet may not seem obvious at first sight.

Here's a fun project attempting to explain what exactly is happening under the hood for some counter-intuitive snippets and lesser-known features in Python.

While some of the examples you see below may not be WTFs in the truest sense, but they'll reveal some of the interesting parts of Python that you might be unaware of. I find it a nice way to learn the internals of a programming language, and I believe that you'll find it interesting too!

If you're an experienced Python programmer, you can take it as a challenge to get most of them right in the first attempt. You may have already experienced some of them before, and I might be able to revive sweet old memories of yours! 😅

Links:
- Interactive Website
- Interactive Notebook
- GitHub Version:
- ENG
- RUS

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #python #programming #coding

@data_science_weekly
Data Analysis with Python and PySpark by Jonathan Rioux

In Data Analysis with Python and PySpark you will learn how to:

- Manage your data as it scales across multiple machines
- Scale up your data programs with full confidence
- Read and write data to and from a variety of sources and formats
- Deal with messy data with PySpark’s data manipulation functionality
- Discover new data sets and perform exploratory data analysis
- Build automated data pipelines that transform, summarize, and get insights from data
- Troubleshoot common PySpark errors
- Creating reliable long-running jobs

Data Analysis with Python and PySpark is your guide to delivering successful Python-driven data projects. Packed with relevant examples and essential techniques, this practical book teaches you to build pipelines for reporting, machine learning, and other data-centric tasks. Quick exercises in every chapter help you practice what you’ve learned, and rapidly start implementing PySpark into your data systems. No previous knowledge of Spark is required.

Link: Direct

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #spark #pyspark #bigdata

@data_science_weekly
Practical Recommender Systems by Kim Falk

Practical Recommender Systems explains how recommender systems work and shows how to create and apply them for your site. After covering the basics, you’ll see how to collect user data and produce personalized recommendations. You’ll learn how to use the most popular recommendation algorithms and see examples of them in action on sites like Amazon and Netflix. Finally, the book covers scaling problems and other issues you’ll encounter as your site grows.

Link: Direct

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #recsys #recommendersystems

@data_science_weekly
CS224W: Machine Learning with Graphs

Complex data can be represented as a graph of relationships between objects. Such networks are a fundamental tool for modeling social, technological, and biological systems. This course focuses on the computational, algorithmic, and modeling challenges specific to the analysis of massive graphs. By means of studying the underlying graph structure and its features, students are introduced to machine learning techniques and data mining tools apt to reveal insights on a variety of networks.

Topics include: representation learning and Graph Neural Networks; algorithms for the World Wide Web; reasoning over Knowledge Graphs; influence maximization; disease outbreak detection, social network analysis.

Links:
- Direct
- Videos

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #graphs #graph #gnn #knowledgegraphs #socialnetworks

@data_science_weekly
🧠 Awesome ChatGPT Prompts

Welcome to the "Awesome ChatGPT Prompts" repository! This is a collection of prompt examples to be used with the ChatGPT model.

The ChatGPT model is a large language model trained by OpenAI that is capable of generating human-like text. By providing it with a prompt, it can generate responses that continue the conversation or expand on the given prompt.

In this repository, you will find a variety of prompts that can be used with ChatGPT.

To get started, simply clone this repository and use the prompts in the README.md file as input for ChatGPT. You can also use the prompts in this file as inspiration for creating your own.

Link: Direct

Navigational hashtags: #armknowledgesharing #armrepo
General hashtags: #prompts #prompt #promptengineering #chatgpt #gpt

@data_science_weekly
Mathematics Of Machine Learning by MIT

Broadly speaking, Machine Learning refers to the automated identification of patterns in data. As such it has been a fertile ground for new statistical and algorithmic developments. The purpose of this course is to provide a mathematically rigorous introduction to these developments with emphasis on methods and their analysis.

Link: Direct

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #math #maths #mathematics #ml

@data_science_weekly
Exceptional Resources for Data Science Interview Preparation. Part 3: Specialized Machine Learning

In the previous article, I shared materials for preparing for the stage on Classical Machine Learning.

In this article, we will look at materials that can be used to prepare for the section on specialized machine learning.

Table of contents
- Resources
- Deep Learning
- Natural Language Processing
- Computer Vision
- Graph Neural Networks
- Reinforcement Learning
- Recommender Systems
- Time Series
- Big Data
- Let’s sum it up
- What’s next?


NB:
I'm the author of the article.
It was initially published in Russian (on habr.com), then I published it on medium.com. So, for Russian speakers I recommend to read Russian version, for English speakers I recommend to read English version and both will benefit from starring the repository, which will be maintained and updated when new resources become available.

Links:
- Medium (eng)
- Habr (rus)

Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #interview #interviewpreparation #machinelearning #ml #deeplearning #dl #nlp #cv #rl #gnn #recsys

@data_science_weekly
DevOps for Data Science by Alex K Gold

In this book, you’ll learn about DevOps conventions, tools, and practices that can be useful to you as a data scientist. You’ll also learn how to work better with the IT/Admin team at your organization, and even how to do a little server administration of your own if you’re pressed into service.

Link: Direct

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #devops #mlops #datascience

@data_science_weekly
Bash Scripting Tutorial for Beginners by Herbert Lindemans

Learn bash scripting in this crash course for beginners. Understanding how to use bash scripting will enhance your productivity by automating tasks, streamlining processes, and making your workflow more efficient.

⌨️ (00:00) Introduction
⌨️ (03:24) Basic commands
⌨️ (06:21) Writing your first bash script
⌨️ (11:29) Variables
⌨️ (14:55) Positional arguments
⌨️ (16:23) Output/Input redirection
⌨️ (23:23) Test operators
⌨️ (25:19) If/Elif/Else
⌨️ (28:37) Case statements
⌨️ (32:16) Arrays
⌨️ (34:12) For loop
⌨️ (36:03) Functions
⌨️ (41:31) Exit codes
⌨️ (42:30) AWK
⌨️ (45:11) SED

Link: Video

Navigational hashtags: #armknowledgesharing #armyoutube
General hashtags: #bash #cmd #terminal

@data_science_weekly
Immersive linear algebra by J. Ström, K. Åström, and T. Akenine-Möller

"A picture says more than a thousand words" is a common expression, and for text books, it is often the case that a figure or an illustration can replace a large number of words as well. However, they believe that an interactive illustration can say even more, and that is why they have decided to build their linear algebra book around such illustrations. They believe that these figures make it easier and faster to digest and to learn linear algebra (which would be the case for many other mathematical books as well, for that matter). In addition, they have added some more features (e.g., popup windows for common linear algebra terms) to their book, and they believe that those features will make it easier and faster to read and understand as well.

After using linear algebra for 20 years times three persons, they were ready to write a linear algebra book that they think will make it substantially easier to learn and to teach linear algebra. In addition, the technology of mobile devices and web browsers have improved beyond a certain threshold, so that this book could be put together in a very novel and innovative way (they think). The idea is to start each chapter with an intuitive concrete example that practically shows how the math works using interactive illustrations. After that, the more formal math is introduced, and the concepts are generalized and sometimes made more abstract. They believe it is easier to understand the entire topic of linear algebra with a simple and concrete example cemented into the reader's mind in the beginning of each chapter.

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #math #linearalgebra #algebra

@data_science_weekly
Oh Shit, Git!?!

Git is hard: screwing up is easy, and figuring out how to fix your mistakes is fucking impossible. Git documentation has this chicken and egg problem where you can't search for how to get yourself out of a mess, unless you already know the name of the thing you need to know about in order to fix your problem.

- I did something terribly wrong, please tell me git has a magic time machine!?!
- I committed and immediately realized I need to make one small change!
- I need to change the message on my last commit!
- I accidentally committed something to master that should have been on a brand new branch!
- I accidentally committed to the wrong branch!
- I tried to run a diff but nothing happened?!
- I need to undo a commit from like 5 commits ago!
- I need to undo my changes to a file!
- I give up

Link

Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #git #versioncontrol #github #gitlab