Artem Ryblov’s Data Science Weekly
282 subscribers
71 photos
95 links
@artemfisherman’s Data Science Weekly: Elevate your expertise with a standout data science resource each week, carefully chosen for depth and impact.

Long-form content: https://artemryblov.substack.com
Download Telegram
StatQuest with Josh Starmer

"Statistics, Machine Learning and Data Science can sometimes seem like very scary topics, but since each technique is really just a combination of small and simple steps, they are actually quite simple. My goal with StatQuest is to break down the major methodologies into easy to understand pieces. That said, I don't dumb down the material. Instead, I build up your understanding so that you are smarter."

This is how Joshua Starmer PhD describes his channel and I completely agree with him!

I watch his videos to understand the meaning of the algorithms before going into details, and I encourage you to do the same!

YouTube: https://www.youtube.com/@statquest/videos
Website: https://statquest.org/
Book: https://www.amazon.com/StatQuest-Illustrated-Guide-Machine-Learning/dp/B09ZCKR4H6

#machinelearning #datascience #algorithms #statistics #phd
#armknowledgesharing #armyoutube

@data_science_weekly
CS 229 ― Machine Learning Cheatsheet

Set of illustrated Machine Learning cheatsheets covering the content of the CS 229 class.

They can (hopefully!) be useful to all future students of this course, as well as to anyone else interested in Machine Learning.

Navigational hashtags: #armknowledgesharing #armcheetsheets
General hashtags: #machinelearning #students #content #supervisedlearning #unsupervisedlearning #deeplearning #tips #tricks #statistics #probability #calculus

@data_science_weekly
Statistics and Probability (Khan Academy)

Learn statistics and probability for free - everything you'd want to know about descriptive and inferential statistics:

Unit 1: Analysing categorical data
Unit 2: Displaying and comparing quantitative data
Unit 3: Summarizing quantitative data
Unit 4: Modelling data distributions
Unit 5: Exploring bivariate numerical data
Unit 6: Study design
Unit 7: Probability
Unit 8: Counting, permutations, and combinations
Unit 9: Random variables
Unit 10: Sampling distributions
Unit 11: Confidence intervals
Unit 12: Significance tests (hypothesis testing)
Unit 13: Two-sample inference for the difference between groups
Unit 14: Inference for categorical data (chi-square tests)
Unit 15: Advanced regression (inference and transforming)
Unit 16: Analysis of variance (ANOVA)

Link: https://www.khanacademy.org/math/statistics-probability

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #statistics #testing #design #data #abtesting #abtest #probability #ttest

@data_science_weekly
MACHINE LEARNING QUESTIONS

Bnomial publishes one machine learning question every day. It aims to teach you something new, one question at a time:

- The questions are practical.
- The answers are well explained, with a proper clarification of why the option is correct and why it is not.
- Reading resources are provided so one can learn more to clarify the topic.

Link: https://today.bnomial.com/

Navigational hashtags: #armknowledgesharing #armnewsletters
General hashtags: #machinelearning #deeplearning #ai #statistics #datascience #dataanalytics

@data_science_weekly
CS109: Probability for Computer Scientists

While the initial foundations of computer science began in the world of discrete mathematics (after all, modern computers are digital in nature), recent years have seen a surge in the use of probability as a tool for the analysis and development of new algorithms and systems. As a result, it is becoming increasingly important for budding computer scientists to understand probability theory, both to provide new perspectives on existing ideas and to help further advance the field in new ways.

CS109: Probability for Computer Scientists starts by providing a fundamental grounding in combinatorics, and then quickly moves into the basics of probability theory. We will then cover many essential concepts in probability theory, including particular probability distributions, properties of probabilities, and mathematical tools for analysing probabilities. Finally, the last third of the class will focus on data analysis and machine learning as a means for seeing direct applications of probability in this exciting and quickly growing subfield of computer science. This is going to be a great quarter, and we are looking forward to the chance to teach you.

Course Topics
Here are the broad strokes of the course (in approximate order). More information is available on our Schedule page. We cover a very broad set of topics so that you are equipped with the probability and statistics you will see in your future CS studies!
- Counting and probability fundamentals
- Single-dimensional random variables
- Probabilistic models
- Uncertainty theory
- Parameter estimation
- Introduction to machine learning

Links
- Course: https://web.stanford.edu/class/cs109/
- Course Book: https://chrispiech.github.io/probabilityForComputerScientists/en/index.html
- Python for Probability: https://web.stanford.edu/class/archive/cs/cs109/cs109.1238/handouts/python.html

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #statistics #probability #stanford #machinelearning #dataanalysis #computerscience #help #mathematics

@data_science_weekly
Mindful Modeler by Christoph Molnar

The newsletter combines the best of two worlds: the performance mindset of machine learning and the mindfulness of statistical thinking.

Machine learning has become mainstream while falling short in the silliest ways: lack of interpretability, biased and missing data, wrong conclusions, … To statisticians, these shortcomings are often unsurprising. Statisticians are relentless in their quest to understand how the data came about. They make sure that their models reflect the data-generating process and interpret models accordingly.
In a sea of people who basically know how to model.fit() and model.predict() you can stand out by bringing statistical thinking to the arena.
Sign up for this newsletter to combine performance-driven machine learning with statistical thinking. Become a mindful modeller.

You'll learn about:
- Thinking like a statistician while performing like a machine learner
- Spotting non-obvious data problems
- Interpretable machine learning
- Other modelling mindsets such as causal inference and prompt engineering

Link
https://mindfulmodeler.substack.com/

Navigational hashtags: #armknowledgesharing #armnewsletters
General hashtags: #modelling #modeling #ml #machinelearning #statistics #modelinterpretation #data #interpretability #casualinference

@data_science_weekly
Thinking Clearly with Data: A Guide to Quantitative Reasoning and Analysis by Ethan Bueno de Mesquita, Anthony Fowler

An introduction to data science or statistics shouldn’t involve proving complex theorems or memorizing obscure terms and formulas, but that is exactly what most introductory quantitative textbooks emphasize. In contrast, Thinking Clearly with Data focuses, first and foremost, on critical thinking and conceptual understanding in order to teach students how to be better consumers and analysts of the kinds of quantitative information and arguments that they will encounter throughout their lives.

Among much else, the book teaches how to assess whether an observed relationship in data reflects a genuine relationship in the world and, if so, whether it is causal; how to make the most informative comparisons for answering questions; what questions to ask others who are making arguments using quantitative evidence; which statistics are particularly informative or misleading; how quantitative evidence should and shouldn’t influence decision-making; and how to make better decisions by using moral values as well as data.

- An ideal textbook for introductory quantitative methods courses in data science, statistics, political science, economics, psychology, sociology, public policy, and other fields
- Introduces the basic toolkit of data analysis―including sampling, hypothesis testing, Bayesian inference, regression, experiments, instrumental variables, differences in differences, and regression discontinuity
- Uses real-world examples and data from a wide variety of subjects
- Includes practice questions and data exercises

Link: https://www.amazon.com/Thinking-Clearly-Data-Quantitative-Reasoning/dp/0691214352

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #datascience #correlation #regression #causation #randomizedexperiments #statistics

@data_science_weekly
Lessons in Statistical Thinking by Daniel Kaplan

One of the oft-stated goals of education is the development of “critical thinking” skills. Although it is rare to see a careful definition of critical thinking, widely accepted elements include framing and recognizing coherent arguments, the application of logic patterns such as deduction, the skeptical evaluation of evidence, consideration of alternative explanations, and a disinclination to accept unsubstantiated claims.

“Statistical thinking” is a variety of critical thinking involving data and inductive reasoning directed to draw reasonable and useful conclusions that can guide decision-making and action.

Surprisingly, many university statistics courses are not primarily about statistical reasoning. They do cover some technical methods used in statistical reasoning, but they have replaced notions of “useful,” “decision-making,” and “action” with doctrines such as “null hypothesis significance testing” and “correlation is not causation.” For example, a core method for drawing responsible conclusions about causal relationships by adjusting for “covariates” is hardly ever even mentioned in conventional statistics courses.

These Lessons in Statistical Thinking present the statistical ideas and methods behind decision-making to guide action.

Link: Direct Link

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #stats #statistics #math #maths

@data_science_weekly
Applied Causal Inference Powered by ML and AI by Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin Spindler, Vasilis Syrgkanis

An introduction to the emerging fusion of machine learning and causal inference.

The book introduces ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and presents Debiased Machine Learning methods to do inference in such models using modern predictive tools.

Links:
- PDF
- Site
- GitHub

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #statistics #ml #ai #causal #causalinference
The Cartoon Guide to Statistics by Larry Gonick, Woollcott Smith

The Cartoon Guide to Statistics covers all the central ideas of modern statistics: the summary and display of data, probability in gambling and medicine, random variables, Bernoulli Trials, the Central Limit Theorem, hypothesis testing, confidence interval estimation, and much more - all explained in simple, clear, and yes, funny illustrations. Never again will you order the Poisson Distribution in a French restaurant!

Links:
- Amazon
- Internet Archive

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #statistics #stats #probability